[GitHub] drill pull request #581: DRILL-4864: Add ANSI format for date/time functions

2016-09-30 Thread gparai
Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/581#discussion_r81438806
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 ---
@@ -408,6 +411,12 @@ private LogicalExpression 
getDrillFunctionFromOptiqCall(RexCall call) {
 
   return first;
 }
+  } else if (functionName.equals("to_date") || 
functionName.equals("to_time") || functionName.equals("to_timestamp")) {
+// convert ansi date format string to joda according to session 
option
+OptionManager om = this.context.getPlannerSettings().getOptions();
+
if(ToDateFormats.valueOf(om.getOption(ExecConstants.TO_DATE_FORMAT).string_val.toUpperCase()).equals(ToDateFormats.ANSI))
 {
+  args.set(1, FunctionCallFactory.createExpression("ansi_to_joda", 
Arrays.asList(args.get(1;
--- End diff --

What would happen if 
drill.exec.fn.to_date_format = 'ansi'  
query: select to_date(1234545, ansi_to_joda('dd-MM-')) from emp;

Would we get select to_date(1234545, 
ansi_to_joda(ansi_to_joda('dd-MM-'))) from emp;?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #581: DRILL-4864: Add ANSI format for date/time functions

2016-09-30 Thread gparai
Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/581#discussion_r81438289
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 ---
@@ -408,6 +411,12 @@ private LogicalExpression 
getDrillFunctionFromOptiqCall(RexCall call) {
 
   return first;
 }
+  } else if (functionName.equals("to_date") || 
functionName.equals("to_time") || functionName.equals("to_timestamp")) {
+// convert ansi date format string to joda according to session 
option
+OptionManager om = this.context.getPlannerSettings().getOptions();
+
if(ToDateFormats.valueOf(om.getOption(ExecConstants.TO_DATE_FORMAT).string_val.toUpperCase()).equals(ToDateFormats.ANSI))
 {
--- End diff --

if (


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #581: DRILL-4864: Add ANSI format for date/time functions

2016-09-30 Thread gparai
Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/581#discussion_r81440134
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/AnsiToJoda.java 
---
@@ -0,0 +1,58 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to you under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+package org.apache.drill.exec.expr.fn.impl;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+
+import javax.inject.Inject;
+
+/**
+ * Replaces all ansi patterns to joda equivalents.
+ */
+@FunctionTemplate(name = "ansi_to_joda",
+  scope = FunctionTemplate.FunctionScope.SIMPLE,
+  nulls= FunctionTemplate.NullHandling.NULL_IF_NULL)
+public class AnsiToJoda implements DrillSimpleFunc {
+
+  @Param
+  VarCharHolder in;
+
+  @Output
+  VarCharHolder out;
+
+  @Inject
+  DrillBuf buffer;
+
+  @Override
+  public void setup() {
+  }
+
+  @Override
+  public void eval() {
+String pattern = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(in.start,
 in.end, in.buffer);
--- End diff --

Would it be good to validate the ANSI pattern prior to converting it to 
JODA?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #581: DRILL-4864: Add ANSI format for date/time functions

2016-09-30 Thread gparai
Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/581#discussion_r81438363
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/AnsiToJoda.java 
---
@@ -0,0 +1,58 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to you under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+package org.apache.drill.exec.expr.fn.impl;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+
+import javax.inject.Inject;
+
+/**
+ * Replaces all ansi patterns to joda equivalents.
+ */
+@FunctionTemplate(name = "ansi_to_joda",
+  scope = FunctionTemplate.FunctionScope.SIMPLE,
+  nulls= FunctionTemplate.NullHandling.NULL_IF_NULL)
--- End diff --

nulls =


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #581: DRILL-4864: Add ANSI format for date/time functions

2016-09-30 Thread gparai
Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/581#discussion_r81439112
  
--- Diff: 
logical/src/main/java/org/apache/drill/common/expression/fn/JodaDateValidator.java
 ---
@@ -0,0 +1,213 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to you under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+package org.apache.drill.common.expression.fn;
+
+import com.google.common.collect.Sets;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.drill.common.map.CaseInsensitiveMap;
+
+import java.util.Comparator;
+import java.util.Set;
+
+public class JodaDateValidator {
+
+  private static final Set ansiValuesForDeleting = 
Sets.newTreeSet(new LengthDescComparator());
+  private static final CaseInsensitiveMap ansiToJodaMap = 
CaseInsensitiveMap.newTreeMap(new LengthDescComparator());
+
+  //tokens for deleting
+  public static final String SUFFIX_SP = "sp";
+  public static final String PREFIX_FM = "fm";
+  public static final String PREFIX_FX = "fx";
+  public static final String PREFIX_TM = "tm";
+
+  //ansi patterns
+  public static final String ANSI_FULL_NAME_OF_DAY = "day";
+  public static final String ANSI_DAY_OF_YEAR = "ddd";
+  public static final String ANSI_DAY_OF_MONTH = "dd";
+  public static final String ANSI_DAY_OF_WEEK = "d";
+  public static final String ANSI_NAME_OF_MONTH = "month";
+  public static final String ANSI_ABR_NAME_OF_MONTH = "mon";
+  public static final String ANSI_FULL_ERA_NAME = "ee";
+  public static final String ANSI_NAME_OF_DAY = "dy";
+  public static final String ANSI_TIME_ZONE_NAME = "tz";
+  public static final String ANSI_HOUR_12_NAME = "hh";
+  public static final String ANSI_HOUR_12_OTHER_NAME = "hh12";
+  public static final String ANSI_HOUR_24_NAME = "hh24";
+  public static final String ANSI_MINUTE_OF_HOUR_NAME = "mi";
+  public static final String ANSI_SECOND_OF_MINUTE_NAME = "ss";
+  public static final String ANSI_MILLISECOND_OF_MINUTE_NAME = "ms";
+  public static final String ANSI_WEEK_OF_YEAR = "ww";
+  public static final String ANSI_MONTH = "mm";
+  public static final String ANSI_HALFDAY_AM = "am";
+  public static final String ANSI_HALFDAY_PM = "pm";
+
+  //jodaTime patterns
+  public static final String JODA_FULL_NAME_OF_DAY = "";
+  public static final String JODA_DAY_OF_YEAR = "D";
+  public static final String JODA_DAY_OF_MONTH = "d";
+  public static final String JODA_DAY_OF_WEEK = "e";
+  public static final String JODA_NAME_OF_MONTH = "";
+  public static final String JODA_ABR_NAME_OF_MONTH = "MMM";
+  public static final String JODA_FULL_ERA_NAME = "G";
+  public static final String JODA_NAME_OF_DAY = "E";
+  public static final String JODA_TIME_ZONE_NAME = "TZ";
+  public static final String JODA_HOUR_12_NAME = "h";
+  public static final String JODA_HOUR_12_OTHER_NAME = "h";
+  public static final String JODA_HOUR_24_NAME = "H";
+  public static final String JODA_MINUTE_OF_HOUR_NAME = "m";
+  public static final String JODA_SECOND_OF_MINUTE_NAME = "s";
+  public static final String JODA_MILLISECOND_OF_MINUTE_NAME = "S";
+  public static final String JODA_WEEK_OF_YEAR = "w";
+  public static final String JODA_MONTH = "MM";
+  public static final String JODA_HALFDAY = "aa";
+
+  static {
+ansiToJodaMap.put(ANSI_FULL_NAME_OF_DAY, JODA_FULL_NAME_OF_DAY);
+ansiToJodaMap.put(ANSI_DAY_OF_YEAR, JODA_DAY_OF_YEAR);
+ansiToJodaMap.put(ANSI_DAY_OF_MONTH, JODA_DAY_OF_MONTH);
+ansiToJodaMap.put(ANSI_DAY_OF_WEEK, JODA_DAY_OF_WEEK);
+ansiToJodaMap.put(ANSI_NAME_OF_MONTH, JODA_NAME_OF_MONTH);
+ansiToJodaMap.put(ANSI_ABR_NAME_OF_MONTH, JODA_ABR_NAME_OF_MONTH);
+ansiToJodaMap.put(ANSI_FULL_ERA_NAME, JODA_FULL_ERA_NAME);
+ansiToJodaMap.put(ANSI_NAME_OF_DAY, JODA_NAME_OF_DAY);
+ansiToJodaMap.put(ANSI_TIME_ZONE_NAME, JODA_TIME_ZONE_NAME);
+ansiToJodaMap.put(ANSI_HOUR_12_NAME, JODA_HOUR_12_NAME);
+

[GitHub] drill pull request #581: DRILL-4864: Add ANSI format for date/time functions

2016-09-30 Thread gparai
Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/581#discussion_r81440360
  
--- Diff: 
logical/src/main/java/org/apache/drill/common/expression/fn/JodaDateValidator.java
 ---
@@ -0,0 +1,213 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to you under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+package org.apache.drill.common.expression.fn;
+
+import com.google.common.collect.Sets;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.drill.common.map.CaseInsensitiveMap;
+
+import java.util.Comparator;
+import java.util.Set;
+
+public class JodaDateValidator {
+
+  private static final Set ansiValuesForDeleting = 
Sets.newTreeSet(new LengthDescComparator());
+  private static final CaseInsensitiveMap ansiToJodaMap = 
CaseInsensitiveMap.newTreeMap(new LengthDescComparator());
+
+  //tokens for deleting
+  public static final String SUFFIX_SP = "sp";
+  public static final String PREFIX_FM = "fm";
+  public static final String PREFIX_FX = "fx";
+  public static final String PREFIX_TM = "tm";
+
+  //ansi patterns
+  public static final String ANSI_FULL_NAME_OF_DAY = "day";
+  public static final String ANSI_DAY_OF_YEAR = "ddd";
+  public static final String ANSI_DAY_OF_MONTH = "dd";
+  public static final String ANSI_DAY_OF_WEEK = "d";
+  public static final String ANSI_NAME_OF_MONTH = "month";
+  public static final String ANSI_ABR_NAME_OF_MONTH = "mon";
+  public static final String ANSI_FULL_ERA_NAME = "ee";
+  public static final String ANSI_NAME_OF_DAY = "dy";
+  public static final String ANSI_TIME_ZONE_NAME = "tz";
+  public static final String ANSI_HOUR_12_NAME = "hh";
+  public static final String ANSI_HOUR_12_OTHER_NAME = "hh12";
+  public static final String ANSI_HOUR_24_NAME = "hh24";
+  public static final String ANSI_MINUTE_OF_HOUR_NAME = "mi";
+  public static final String ANSI_SECOND_OF_MINUTE_NAME = "ss";
+  public static final String ANSI_MILLISECOND_OF_MINUTE_NAME = "ms";
+  public static final String ANSI_WEEK_OF_YEAR = "ww";
+  public static final String ANSI_MONTH = "mm";
+  public static final String ANSI_HALFDAY_AM = "am";
+  public static final String ANSI_HALFDAY_PM = "pm";
+
+  //jodaTime patterns
+  public static final String JODA_FULL_NAME_OF_DAY = "";
+  public static final String JODA_DAY_OF_YEAR = "D";
+  public static final String JODA_DAY_OF_MONTH = "d";
+  public static final String JODA_DAY_OF_WEEK = "e";
+  public static final String JODA_NAME_OF_MONTH = "";
+  public static final String JODA_ABR_NAME_OF_MONTH = "MMM";
+  public static final String JODA_FULL_ERA_NAME = "G";
+  public static final String JODA_NAME_OF_DAY = "E";
+  public static final String JODA_TIME_ZONE_NAME = "TZ";
+  public static final String JODA_HOUR_12_NAME = "h";
+  public static final String JODA_HOUR_12_OTHER_NAME = "h";
+  public static final String JODA_HOUR_24_NAME = "H";
+  public static final String JODA_MINUTE_OF_HOUR_NAME = "m";
+  public static final String JODA_SECOND_OF_MINUTE_NAME = "s";
+  public static final String JODA_MILLISECOND_OF_MINUTE_NAME = "S";
+  public static final String JODA_WEEK_OF_YEAR = "w";
+  public static final String JODA_MONTH = "MM";
+  public static final String JODA_HALFDAY = "aa";
+
+  static {
+ansiToJodaMap.put(ANSI_FULL_NAME_OF_DAY, JODA_FULL_NAME_OF_DAY);
+ansiToJodaMap.put(ANSI_DAY_OF_YEAR, JODA_DAY_OF_YEAR);
+ansiToJodaMap.put(ANSI_DAY_OF_MONTH, JODA_DAY_OF_MONTH);
+ansiToJodaMap.put(ANSI_DAY_OF_WEEK, JODA_DAY_OF_WEEK);
+ansiToJodaMap.put(ANSI_NAME_OF_MONTH, JODA_NAME_OF_MONTH);
+ansiToJodaMap.put(ANSI_ABR_NAME_OF_MONTH, JODA_ABR_NAME_OF_MONTH);
+ansiToJodaMap.put(ANSI_FULL_ERA_NAME, JODA_FULL_ERA_NAME);
+ansiToJodaMap.put(ANSI_NAME_OF_DAY, JODA_NAME_OF_DAY);
+ansiToJodaMap.put(ANSI_TIME_ZONE_NAME, JODA_TIME_ZONE_NAME);
+ansiToJodaMap.put(ANSI_HOUR_12_NAME, JODA_HOUR_12_NAME);
+

[GitHub] drill pull request #581: DRILL-4864: Add ANSI format for date/time functions

2016-09-30 Thread gparai
Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/581#discussion_r81438265
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 ---
@@ -408,6 +411,12 @@ private LogicalExpression 
getDrillFunctionFromOptiqCall(RexCall call) {
 
   return first;
 }
+  } else if (functionName.equals("to_date") || 
functionName.equals("to_time") || functionName.equals("to_timestamp")) {
--- End diff --

equalsIgnoreCase needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4921) Scripts drill_config.sh, drillbit.sh, and drill-embedded fail when accessed via a symbolic link

2016-09-30 Thread Boaz Ben-Zvi (JIRA)
Boaz Ben-Zvi created DRILL-4921:
---

 Summary: Scripts drill_config.sh,  drillbit.sh, and drill-embedded 
fail when accessed via a symbolic link
 Key: DRILL-4921
 URL: https://issues.apache.org/jira/browse/DRILL-4921
 Project: Apache Drill
  Issue Type: Bug
  Components:  Server
Affects Versions: 1.8.0
 Environment: The drill-embedded on the Mac; the other files on Linux
Reporter: Boaz Ben-Zvi
Priority: Minor
 Fix For: 1.9.0


  Several of the drill... scripts under $DRILL_HOME/bin use "pwd" to produce 
the local path of that script. However "pwd" defaults to "logical" (i.e. the 
same as "pwd -L"); so if accessed via a symbolic link, that link is used 
verbatim in the path, which can produce wrong paths (e.g., when followed by "cd 
..").

For example, creating a symbolic link and using it (on the Mac):
$  cd ~/drill
$  ln -s $DRILL_HOME/bin 
$  bin/drill-embedded
ERROR: Drill config file missing: 
/Users/boazben-zvi/drill/conf/drill-override.conf -- Wrong config dir?

Similarly on Linux the CLASS_PATH gets set wrong (when running "drillbit.sh 
start" via a symlink).

Solution: need to replace all the "pwd" in all the scripts with "pwd -P" which 
produces the Physical path. (Or replace a preceding "cd" with "cd -P" which 
does the same).

Relevant scripts:
=
$ cd bin; grep pwd *
drillbit.sh:bin=`cd "$bin">/dev/null; pwd`
drillbit.sh:  echo "cwd:" `pwd`
drill-conf:bin=`cd "$bin">/dev/null; pwd`
drill-config.sh:home=`cd "$bin/..">/dev/null; pwd`
drill-config.sh:  DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
drill-config.sh:JAVA_HOME="$( cd -P "$( dirname "$SOURCE" )" && cd .. && 
pwd )"
drill-embedded:bin=`cd "$bin">/dev/null; pwd`
drill-localhost:bin=`cd "$bin">/dev/null; pwd`
submit_plan:bin=`cd "$bin">/dev/null; pwd`
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: ZK lost connectivity issue on large cluster

2016-09-30 Thread François Méthot
After the 30 seconds gap, all the Drill nodes receives the following:

2016-09-26 20:07:38,629 [Curator-ServiceCache-0] Debug Active drillbit set
changed. Now includes 220 total bits. New Active drill bits
...faulty node is not on the list...
2016-09-26 20:07:38,897 [Curator-ServiceCache-0] Debug Active drillbit set
changed. Now includes 221 total bits. New Active drill bits
...faulty node is back on the list...


So the faulty Drill node get unregistered and registered right after.

Drill is using the low level API for registering and unregistering, and the
only place with unregistering occurs is when the drillbit is closed at
shutdown.

That particular drillbit is still up and running after those log, it could
not have trigger the unregistering process through a shutdown.




Would you have an idea what else could cause a Drillbit to be unregistered
from the DiscoveryService and registered again right after?



We are using Zookeeper 3.4.5










On Wed, Sep 28, 2016 at 10:36 AM, François Méthot 
wrote:

> Hi,
>
>  We have switched to 1.8 and we are still getting node disconnection.
>
> We did many tests, we thought initially our stand alone parquet converter
> was generating parquet files with problematic data (like 10K characters
> string), but we were able to reproduce it with employee data from the
> tutorial.
>
> For example,  we duplicated the Drill Tutorial "Employee" data to reach
> 500 M records spread over 130 parquet files.
> Each files is ~60 MB.
>
>
> We ran this query over and over on 5 different sessions using a script:
>select * from hdfs.tmp.`PARQUET_EMPLOYEE` where full_name like '%does
> not exist%';
>
>Query return no rows and would take ~35 to 45 seconds to return.
>
> Leaving the script running on each node, we eventually hit the "nodes lost
> connectivity during query" error.
>
> One the done that failed,
>
>We see those log:
> 2016-09-26 20:07:09,029 [...uuid...frag:1:10] INFO 
> o.a.d.e.w.f.FragmentStatusReporter
> - ...uuid...:1:10: State to report: RUNNING
> 2016-09-26 20:07:09,029 [...uuid...frag:1:10] DEBUG
> o.a.d.e.w.FragmentExecutor - Starting fragment 1:10 on server064:31010
>
> <--- 30 seconds gap for that fragment --->
>
> 2016-09-26 20:37:09,976 [BitServer-2] WARN 
> o.a.d.exec.rpc.control.ControlServer
> - Message of mode REQUEST of rpc type 2 took longer then 500 ms. Actual
> duration was 23617ms.
>
> 2016-09-26 20:07:38,211 [...uuid...frag:1:10] DEBUG 
> o.a.d.e.p.i.s.RemovingRecordBatch
> - doWork(): 0 records copied out of 0, remaining: 0 incoming schema
> BatchSchema [, selectionVector=TWO_BYTE]
> 2016-09-26 20:07:38,211 [...uuid...frag:1:10] DEBUG 
> o.a.d.exec.rpc.control.WorkEventBus
> - Cancelling and removing fragment manager : ...uuid...
>
>
>
> For the same query on a working node:
> 2016-09-26 20:07:09,056 [...uuid...frag:1:2] INFO 
> o.a.d.e.w.f.FragmentStatusReporter
> - ...uuid...:1:2: State to report: RUNNING
> 2016-09-26 20:07:09,056 [...uuid...frag:1:2] DEBUG
> o.a.d.e.w.FragmentExecutor - Starting fragment 1:2 on server125:31010
> 2016-09-26 20:07:09,749 [...uuid...frag:1:2] DEBUG 
> o.a.d.e.p.i.s.RemovingRecordBatch
> - doWork(): 0 records copied out of 0, remaining: 0 incoming schema
> BatchSchema [, selectionVector=TWO_BYTE]
> 2016-09-26 20:07:09,749 [...uuid...frag:1:2] DEBUG 
> o.a.d.e.p.i.s.RemovingRecordBatch
> - doWork(): 0 records copied out of 0, remaining: 0 incoming schema
> BatchSchema [, selectionVector=TWO_BYTE]
> 2016-09-26 20:07:11,005 [...uuid...frag:1:2] DEBUG 
> o.a.d.e.s.p.c.ParquetRecordReader
> - Read 87573 records out of row groups(0) in file `/data/drill/tmp/PARQUET_
> EMPLOYEE/0_0_14.parquet
>
>
>
>
> We are investigating what could get cause that 30 seconds gap for that
> fragment.
>
> Any idea let us know
>
> Thanks
> Francois
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Sep 19, 2016 at 2:59 PM, François Méthot 
> wrote:
>
>> Hi Sudheesh,
>>
>>   If I add selection filter so that no row are returned, the same problem
>> occur. I also simplified the query to include only few integer columns.
>>
>> That particular data repo is ~200+ Billions records spread over ~50 000
>> parquet files.
>>
>> We have other CSV data repo that are 100x smaller that does not trigger
>> this issue.
>>
>>
>> + Is atsqa4-133.qa.lab [1] the Foreman node for the query in this case?
>> There is also a bizarre case where the node that is reported as lost is the
>> node itself.
>> Yes, the stack trace is from the ticket, It did occurred once or twice
>> (in the many many attempts) that it was the node itself.
>>
>> + Is there a spike in memory usage of the Drillbit this is the Foreman
>> for the query (process memory, not just heap)?
>> We don't notice any unusual spike, each nodes gets busy in the same range
>> when query is running.
>>
>> I tried running with 8GB/20GB and 4GB/24GB heap/off-heap, did not see any
>> improvement.
>>
>>
>> We will update from 1.7 to 

[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly

2016-09-30 Thread jaltekruse
Github user jaltekruse commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r81396800
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -918,18 +916,22 @@ public void setMax(Object max) {
 @JsonProperty public ConcurrentHashMap columnTypeInfo;
 @JsonProperty List files;
 @JsonProperty List directories;
-@JsonProperty String drillVersion;
--- End diff --

I had intentionally added the drill version here assuming that it would be 
good information to have around if a similar issue ever comes up the the 
future, as well as provide all of the info we need to have an explicit flag 
that the dates have become correct. For this to work completely, this commit 
should be the last commit right before a release (it could be a point release). 
Any particular reason that we would want to not write it into the file?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: select count(1) : Cannot convert Indexed schema to NamePart

2016-09-30 Thread Zelaine Fong
This looks like it probably got introduced some time shortly before the 1.8
release went out.  I tried your query on one of the early release candidate
builds for 1.8, and your query works fine in that older build.

We'll put this on the list to take a look at.

Thanks for reporting this.

-- Zelaine


On Fri, Sep 30, 2016 at 6:08 AM, François Méthot 
wrote:

> I have created a ticket:
>
> https://issues.apache.org/jira/browse/DRILL-4919
>
> The error happen on csv with header.
>
> The actual error from the Drill's original TextFormatPlugin is
>
> Error: UNSUPPORTED_OPERATION ERROR: With extractHeader enabled, only header
> names are supported
>
>
> Forget about the originally reported error, it happens on a modified
> version of the TextFormatPlugin we are using.
>
>
> On Wed, Sep 28, 2016 at 1:01 PM, Jinfeng Ni  wrote:
>
> > I tried to query a regular csv file and a csv.gz file, and did not run
> > into the problem you saw. When you create a JIRA, it would be helpful
> > if you can share a sample file for re-produce purpose.
> >
> >
> >
> > On Wed, Sep 28, 2016 at 9:33 AM, Aman Sinha 
> wrote:
> > > Is this specific to CSV format files ?  Yes, you should create a JIRA
> for
> > > this.   Thanks for reporting.
> > >
> > > On Wed, Sep 28, 2016 at 8:55 AM, François Méthot 
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >>  Since release 1.8,
> > >>
> > >> we have a workspace hdfs.datarepo1 mapped to
> > >> /year/month/day/
> > >> containging csv.gz
> > >>
> > >> if we do select count(1) on any level of the dir structure like
> > >>select count(1) from hdfs.datarepo1.`/2016/08`;
> > >> We get
> > >> Error: SYSTEM ERROR: IllegalStateException: You cannot convert a
> > >> indexed schema path to a   NamePart. NameParts can only reference
> > Vectors,
> > >> not individual records or values.
> > >>
> > >> same error with
> > >>select count(1) from hdfs.datarepo1.`/` where dir0=2016 and
> dir1=08;
> > >>
> > >>
> > >> While this query works (or any select column)
> > >>select count(column1) from hdfs.datarepo1.`/2016/08`;
> > >>
> > >>
> > >> Should I create a ticket?
> > >>
> > >>
> > >> Francois
> > >>
> >
>


[GitHub] drill pull request #600: DRILL-4373: Drill and Hive have incompatible timest...

2016-09-30 Thread vdiravka
GitHub user vdiravka opened a pull request:

https://github.com/apache/drill/pull/600

DRILL-4373: Drill and Hive have incompatible timestamp representations in 
parquet

- added sys/sess option "store.parquet.int96_as_timestamp";
- added int96 to timestamp converter for both readers;
- added unit tests;

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vdiravka/drill DRILL-4373

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/600.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #600


commit 0d768c42f7c732360cafcacc91e29b67ae44fca4
Author: Vitalii Diravka 
Date:   2016-09-02T21:43:50Z

DRILL-4373: Drill and Hive have incompatible timestamp representations in 
parquet
- added sys/sess option "store.parquet.int96_as_timestamp";
- added int96 to timestamp converter for both readers;
- added unit tests;




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4920) Connect to json file on web (http/https)

2016-09-30 Thread Michael Rans (JIRA)
Michael Rans created DRILL-4920:
---

 Summary: Connect to json file on web (http/https)
 Key: DRILL-4920
 URL: https://issues.apache.org/jira/browse/DRILL-4920
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - JSON
Affects Versions: 1.8.0
Reporter: Michael Rans


I have not been able to set up Drill to connect to a JSON file at url:
https://data.humdata.org/api/3/action/current_package_list_with_resources?limit=1

I can connect to files locally.

It is not clear to me from the documentation whether or not this feature 
exists. If it doesn't, it should.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: select count(1) : Cannot convert Indexed schema to NamePart

2016-09-30 Thread François Méthot
I have created a ticket:

https://issues.apache.org/jira/browse/DRILL-4919

The error happen on csv with header.

The actual error from the Drill's original TextFormatPlugin is

Error: UNSUPPORTED_OPERATION ERROR: With extractHeader enabled, only header
names are supported


Forget about the originally reported error, it happens on a modified
version of the TextFormatPlugin we are using.


On Wed, Sep 28, 2016 at 1:01 PM, Jinfeng Ni  wrote:

> I tried to query a regular csv file and a csv.gz file, and did not run
> into the problem you saw. When you create a JIRA, it would be helpful
> if you can share a sample file for re-produce purpose.
>
>
>
> On Wed, Sep 28, 2016 at 9:33 AM, Aman Sinha  wrote:
> > Is this specific to CSV format files ?  Yes, you should create a JIRA for
> > this.   Thanks for reporting.
> >
> > On Wed, Sep 28, 2016 at 8:55 AM, François Méthot 
> > wrote:
> >
> >> Hi,
> >>
> >>  Since release 1.8,
> >>
> >> we have a workspace hdfs.datarepo1 mapped to
> >> /year/month/day/
> >> containging csv.gz
> >>
> >> if we do select count(1) on any level of the dir structure like
> >>select count(1) from hdfs.datarepo1.`/2016/08`;
> >> We get
> >> Error: SYSTEM ERROR: IllegalStateException: You cannot convert a
> >> indexed schema path to a   NamePart. NameParts can only reference
> Vectors,
> >> not individual records or values.
> >>
> >> same error with
> >>select count(1) from hdfs.datarepo1.`/` where dir0=2016 and dir1=08;
> >>
> >>
> >> While this query works (or any select column)
> >>select count(column1) from hdfs.datarepo1.`/2016/08`;
> >>
> >>
> >> Should I create a ticket?
> >>
> >>
> >> Francois
> >>
>


[jira] [Created] (DRILL-4919) select count(1) on csv with header no longer works

2016-09-30 Thread JIRA
F Méthot created DRILL-4919:
---

 Summary: select count(1) on csv with header no longer works
 Key: DRILL-4919
 URL: https://issues.apache.org/jira/browse/DRILL-4919
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.8.0
Reporter: F Méthot
Priority: Minor


Dataset (I used extended char for display purpose) test.csvh:

a,b,c,d\n
1,2,3,4\n
5,6,7,8\n

Storage config:
"csvh": {
  "type": "text",
  "extensions" : [
  "csvh"
   ],
   "extractHeader": true,
   "delimiter": ","
  }

select count(1) from dfs.`test.csvh`

Error: UNSUPPORTED_OPERATION ERROR: With extractHeader enabled, only header 
names are supported
coumn name columns
column index
Fragment 0:0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)