[jira] [Commented] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994875#comment-14994875
 ] 

Jacques Nadeau commented on DRILL-4046:
---

That's crazy, we had this bug two years ago. We're using the wrong date parsing 
functions.

[~mehant] rewrote the date casting functions so we didn't rely on Joda time. 
Something has caused a regression in the function resolution.

Looking at the commit history, I wonder if this commit caused the regression:

https://github.com/apache/drill/commit/17abf36adf54ef73e151c21497b0fdba9e7864fa

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout)
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994882#comment-14994882
 ] 

Sudheesh Katkam commented on DRILL-4046:


That was my first suspect, and I am not sure. This regression is happening 
against 1.2.0, so I am assuming there are other places.

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout)
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4042) Unable to run sqlline in embedded mode on Windows

2015-11-06 Thread Aditya Kishore (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994629#comment-14994629
 ] 

Aditya Kishore commented on DRILL-4042:
---

This patch combined with the new binaries published by Patrick have fixed the 
issue.

Will commit the patch shortly.

> Unable to run sqlline in embedded mode on Windows
> -
>
> Key: DRILL-4042
> URL: https://issues.apache.org/jira/browse/DRILL-4042
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: 1.3.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Blocker
> Attachments: DRILL-4042.1.patch.txt
>
>
> Hadoop binaries ({{hadoop.dll}}, {{winutils.exe}}) bundled with Drill are out 
> of date and needs to be rebuilt with Hadoop 2.7.
> Running sqlline in embedded mode hangs after any command.
> {noformat}
> $ ./bin/sqlline -u jdbc:drill:zk=local
> Nov 05, 2015 3:23:19 PM org.glassfish.jersey.server.ApplicationHandler 
> initialize
> INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 
> 01:25:26...
> apache drill 1.3.0-SNAPSHOT
> "drill baby drill"
> 0: jdbc:drill:zk=local> use dfs;
> Exception in thread "drill-executor-2" java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Ljava/lang/String;JJJI)Ljava/io/FileDescriptor;
> at 
> org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Native 
> Method)
> at 
> org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileOutputStreamWithMode(NativeIO.java:559)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:219)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:209)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:305)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:294)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:326)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:393)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:776)
> at 
> org.apache.drill.exec.store.dfs.DrillFileSystem.create(DrillFileSystem.java:159)
> at 
> org.apache.drill.exec.store.sys.local.FilePStore.put(FilePStore.java:145)
> at 
> org.apache.drill.exec.work.foreman.QueryManager.writeFinalProfile(QueryManager.java:307)
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:749)
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
> at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73)
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894)
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994682#comment-14994682
 ] 

ASF GitHub Bot commented on DRILL-2288:
---

Github user hnfgns commented on a diff in the pull request:

https://github.com/apache/drill/pull/245#discussion_r44198946
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java
 ---
@@ -33,36 +33,113 @@
 import org.apache.drill.exec.util.BatchPrinter;
 import org.apache.drill.exec.vector.VectorValidator;
 
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.*;
+
+
 public class IteratorValidatorBatchIterator implements 
CloseableRecordBatch {
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(IteratorValidatorBatchIterator.class);
+  private static final org.slf4j.Logger logger =
+  
org.slf4j.LoggerFactory.getLogger(IteratorValidatorBatchIterator.class);
 
   static final boolean VALIDATE_VECTORS = false;
 
-  private IterOutcome state = IterOutcome.NOT_YET;
+  /** For logging/debuggability only. */
+  private static volatile int instanceCount;
+
+  /** For logging/debuggability only. */
+  private final int instNum;
+  {
+instNum = ++instanceCount;
--- End diff --

not thread-safe but not a show stopper since it is improving debugging 
experience.


> ScanBatch violates IterOutcome protocol for zero-row sources [was: missing 
> JDBC metadata (schema) for 0-row results...]
> ---
>
> Key: DRILL-2288
> URL: https://issues.apache.org/jira/browse/DRILL-2288
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Information Schema
>Reporter: Daniel Barclay (Drill)
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.4.0
>
> Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java
>
>
> The ResultSetMetaData object from getMetadata() of a ResultSet is not set up 
> (getColumnCount() returns zero, and trying to access any other metadata 
> throws IndexOutOfBoundsException) for a result set with zero rows, at least 
> for one from DatabaseMetaData.getColumns(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994683#comment-14994683
 ] 

ASF GitHub Bot commented on DRILL-2288:
---

Github user hnfgns commented on a diff in the pull request:

https://github.com/apache/drill/pull/245#discussion_r44199009
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java
 ---
@@ -33,36 +33,113 @@
 import org.apache.drill.exec.util.BatchPrinter;
 import org.apache.drill.exec.vector.VectorValidator;
 
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.*;
+
+
 public class IteratorValidatorBatchIterator implements 
CloseableRecordBatch {
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(IteratorValidatorBatchIterator.class);
+  private static final org.slf4j.Logger logger =
+  
org.slf4j.LoggerFactory.getLogger(IteratorValidatorBatchIterator.class);
 
   static final boolean VALIDATE_VECTORS = false;
 
-  private IterOutcome state = IterOutcome.NOT_YET;
+  /** For logging/debuggability only. */
+  private static volatile int instanceCount;
+
+  /** For logging/debuggability only. */
+  private final int instNum;
+  {
+instNum = ++instanceCount;
+  }
+
+  /**
+   * The upstream batch, calls to which and return values from which are
+   * checked by this validator.
+   */
   private final RecordBatch incoming;
-  private boolean first = true;
+
+  /** Incoming batch's type (simple class name); for logging/debuggability
+   *  only. */
+  private final String batchTypeName;
+
+  /** Exception state of incoming batch; last value thrown by its next()
+   *  method. */
+  private Throwable exceptionState = null;
+
+  /** Main state of incoming batch; last value returned by its next() 
method. */
+  private IterOutcome batchState = null;
+
+  /** Last schema retrieved after OK_NEW_SCHEMA or OK from next().  Null 
if none
+   *  yet. Currently for logging/debuggability only. */
+  private BatchSchema lastSchema = null;
+
+  /** Last schema retrieved after OK_NEW_SCHEMA from next().  Null if none 
yet.
+   *  Currently for logging/debuggability only. */
--- End diff --

pretty good doc here :100: 


> ScanBatch violates IterOutcome protocol for zero-row sources [was: missing 
> JDBC metadata (schema) for 0-row results...]
> ---
>
> Key: DRILL-2288
> URL: https://issues.apache.org/jira/browse/DRILL-2288
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Information Schema
>Reporter: Daniel Barclay (Drill)
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.4.0
>
> Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java
>
>
> The ResultSetMetaData object from getMetadata() of a ResultSet is not set up 
> (getColumnCount() returns zero, and trying to access any other metadata 
> throws IndexOutOfBoundsException) for a result set with zero rows, at least 
> for one from DatabaseMetaData.getColumns(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3751) Query hang when zookeeper is stopped

2015-11-06 Thread Sean Hsuan-Yi Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994855#comment-14994855
 ] 

Sean Hsuan-Yi Chu commented on DRILL-3751:
--

The current retry number is 7200.

I tried to set it to 2 in drill-override.conf:

{code}
drill.exec: {
  cluster-id: "drillbits1",
  zk.connect: "localhost:2181",
  zk.retry.count = 2
}
{code}

Then, Drill will fail after a few seconds. 

However, as [~khfaraaz] pointed out
Query state is reported as RUNNING on the Web UI. Which results from the 
zookeeper's cache. 

> Query hang when zookeeper is stopped
> 
>
> Key: DRILL-3751
> URL: https://issues.apache.org/jira/browse/DRILL-3751
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
> Environment: 4 node cluster on CentOS
>Reporter: Khurram Faraaz
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
> Fix For: 1.4.0
>
>
> I see an indefinite hang on sqlline prompt, issue a long running query and 
> then stop zookeeper process when the query is still being executed. Sqlline 
> prompt is never returned and it hangs showing the below stack trace. I am on 
> master.
> Steps to reproduce the problem
> clush -g khurram service mapr-warden stop
> clush -g khurram service mapr-warden start
> Issue long running query from sqlline
> While query is running, stop zookeeper using script.
> To stop zookeeper 
> {code}
> [root@centos-01 bin]# ./zkServer.sh stop
> JMX enabled by default
> Using config: /opt/mapr/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg
> Stopping zookeeper ... STOPPED
> {code}
> Issue below long running query from sqlline
> {code}
> ./sqlline -u "jdbc:drill:schema=dfs.tmp"
> 0: jdbc:drill:schema=dfs.tmp> select * from `twoKeyJsn.json` limit 800;
> ...
> | 7.40907649723E8  | g|
> | 1.12378007695E9  | d|
> 03:03:28.482 [CuratorFramework-0] ERROR org.apache.curator.ConnectionState - 
> Connection timed out for connection string (10.10.100.201:5181) and timeout 
> (5000) / elapsed (5013)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = 
> ConnectionLoss
>   at 
> org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198) 
> [curator-client-2.5.0.jar:na]
>   at 
> org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) 
> [curator-client-2.5.0.jar:na]
>   at 
> org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
>  [curator-client-2.5.0.jar:na]
>   at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:807)
>  [curator-framework-2.5.0.jar:na]
>   at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:793)
>  [curator-framework-2.5.0.jar:na]
>   at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:57)
>  [curator-framework-2.5.0.jar:na]
>   at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:275)
>  [curator-framework-2.5.0.jar:na]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> [na:1.7.0_45]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
>   at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> {code}
> Here is the stack for sqlline process
> {code}
> [root@centos-01 bin]# /usr/java/jdk1.7.0_45/bin/jstack 32136
> 2015-09-05 03:21:52
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.45-b08 mixed mode):
> "Attach Listener" daemon prio=10 tid=0x7f8328003800 nid=0x27f1 waiting on 
> condition [0x]
>java.lang.Thread.State: RUNNABLE
> "CuratorFramework-0-EventThread" daemon prio=10 tid=0x012fd800 
> nid=0x26e1 waiting on condition [0x7f8317c2e000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007e2117798> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:491)
> "CuratorFramework-0-SendThread(centos-01.qa.lab:5181)" daemon prio=10 
> tid=0x01109800 nid=0x26e0 waiting on condition [0x7f8317b2d000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at 

[jira] [Commented] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994866#comment-14994866
 ] 

Sudheesh Katkam commented on DRILL-4046:


In fact, these was a similar regression against 
[1.2.0|https://github.com/apache/drill/tree/1.2.0], but that happened when run 
multiple times.

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout)
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994866#comment-14994866
 ] 

Sudheesh Katkam edited comment on DRILL-4046 at 11/7/15 1:53 AM:
-

In fact, there was a similar regression against 
[1.2.0|https://github.com/apache/drill/tree/1.2.0], but that happened when run 
multiple times.


was (Author: sudheeshkatkam):
In fact, these was a similar regression against 
[1.2.0|https://github.com/apache/drill/tree/1.2.0], but that happened when run 
multiple times.

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout)
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994858#comment-14994858
 ] 

Sudheesh Katkam edited comment on DRILL-4046 at 11/7/15 1:54 AM:
-

The only thing that is consistent about these regressions is that there are a 
lot of fragments in this state (waiting to get the ISOChronology instance) on a 
specific drillbit:
{code}
"29c6bb10-3447-65c9-1e7c-985afdcb83ea:frag:7:137" daemon prio=10 
tid=0x7f593290e000 nid=0x2132 waiting for monitor entry [0x7f58998f2000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.joda.time.chrono.ISOChronology.getInstance(ISOChronology.java:104)
- waiting to lock <0x00060d627e90> (a java.util.HashMap)
at org.joda.time.chrono.ISOChronology.getInstance(ISOChronology.java:86)
at org.joda.time.DateTimeUtils.getChronology(DateTimeUtils.java:283)
at 
org.joda.time.format.DateTimeFormatter.selectChronology(DateTimeFormatter.java:942)
at 
org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:851)
at org.joda.time.DateTime.parse(DateTime.java:144)
at 
org.apache.drill.exec.test.generated.FiltererGen87.doEval(FilterTemplate2.java:144)
at 
org.apache.drill.exec.test.generated.FiltererGen87.filterBatchNoSV(FilterTemplate2.java:99)
at 
org.apache.drill.exec.test.generated.FiltererGen87.filterBatch(FilterTemplate2.java:72)
at 
org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.doWork(FilterRecordBatch.java:80)
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
...
{code}

This contention has been fixed in the latest version of 
[ISOChronology|https://github.com/JodaOrg/joda-time/commit/634066471f2941eddfcca3ed2a62c9d254cabccb].
 What I don't understand is why DRILL-3242 (or patches around that) would make 
this bug appear more frequently.


was (Author: sudheeshkatkam):
The only thing that is consistent about these regressions is that there are a 
lot of fragments in this state (waiting to get the ISOChronology instance) on a 
specific:
{code}
"29c6bb10-3447-65c9-1e7c-985afdcb83ea:frag:7:137" daemon prio=10 
tid=0x7f593290e000 nid=0x2132 waiting for monitor entry [0x7f58998f2000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.joda.time.chrono.ISOChronology.getInstance(ISOChronology.java:104)
- waiting to lock <0x00060d627e90> (a java.util.HashMap)
at org.joda.time.chrono.ISOChronology.getInstance(ISOChronology.java:86)
at org.joda.time.DateTimeUtils.getChronology(DateTimeUtils.java:283)
at 
org.joda.time.format.DateTimeFormatter.selectChronology(DateTimeFormatter.java:942)
at 
org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:851)
at org.joda.time.DateTime.parse(DateTime.java:144)
at 
org.apache.drill.exec.test.generated.FiltererGen87.doEval(FilterTemplate2.java:144)
at 
org.apache.drill.exec.test.generated.FiltererGen87.filterBatchNoSV(FilterTemplate2.java:99)
at 
org.apache.drill.exec.test.generated.FiltererGen87.filterBatch(FilterTemplate2.java:72)
at 
org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.doWork(FilterRecordBatch.java:80)
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
...
{code}

This contention has been fixed in the latest version of 
[ISOChronology|https://github.com/JodaOrg/joda-time/commit/634066471f2941eddfcca3ed2a62c9d254cabccb].
 What I don't understand is why DRILL-3242 (or patches around that) would make 
this bug appear more frequently.

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout)
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994858#comment-14994858
 ] 

Sudheesh Katkam edited comment on DRILL-4046 at 11/7/15 1:54 AM:
-

The only thing that is consistent about these regressions is that there are a 
lot of fragments in this state (waiting to get the ISOChronology instance) on a 
specific:
{code}
"29c6bb10-3447-65c9-1e7c-985afdcb83ea:frag:7:137" daemon prio=10 
tid=0x7f593290e000 nid=0x2132 waiting for monitor entry [0x7f58998f2000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.joda.time.chrono.ISOChronology.getInstance(ISOChronology.java:104)
- waiting to lock <0x00060d627e90> (a java.util.HashMap)
at org.joda.time.chrono.ISOChronology.getInstance(ISOChronology.java:86)
at org.joda.time.DateTimeUtils.getChronology(DateTimeUtils.java:283)
at 
org.joda.time.format.DateTimeFormatter.selectChronology(DateTimeFormatter.java:942)
at 
org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:851)
at org.joda.time.DateTime.parse(DateTime.java:144)
at 
org.apache.drill.exec.test.generated.FiltererGen87.doEval(FilterTemplate2.java:144)
at 
org.apache.drill.exec.test.generated.FiltererGen87.filterBatchNoSV(FilterTemplate2.java:99)
at 
org.apache.drill.exec.test.generated.FiltererGen87.filterBatch(FilterTemplate2.java:72)
at 
org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.doWork(FilterRecordBatch.java:80)
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
...
{code}

This contention has been fixed in the latest version of 
[ISOChronology|https://github.com/JodaOrg/joda-time/commit/634066471f2941eddfcca3ed2a62c9d254cabccb].
 What I don't understand is why DRILL-3242 (or patches around that) would make 
this bug appear more frequently.


was (Author: sudheeshkatkam):
The only thing that is consistent about these regressions is that there are a 
lot of fragments in this state (waiting to get the ISOChronology instance):
{code}
"29c6bb10-3447-65c9-1e7c-985afdcb83ea:frag:7:137" daemon prio=10 
tid=0x7f593290e000 nid=0x2132 waiting for monitor entry [0x7f58998f2000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.joda.time.chrono.ISOChronology.getInstance(ISOChronology.java:104)
- waiting to lock <0x00060d627e90> (a java.util.HashMap)
at org.joda.time.chrono.ISOChronology.getInstance(ISOChronology.java:86)
at org.joda.time.DateTimeUtils.getChronology(DateTimeUtils.java:283)
at 
org.joda.time.format.DateTimeFormatter.selectChronology(DateTimeFormatter.java:942)
at 
org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:851)
at org.joda.time.DateTime.parse(DateTime.java:144)
at 
org.apache.drill.exec.test.generated.FiltererGen87.doEval(FilterTemplate2.java:144)
at 
org.apache.drill.exec.test.generated.FiltererGen87.filterBatchNoSV(FilterTemplate2.java:99)
at 
org.apache.drill.exec.test.generated.FiltererGen87.filterBatch(FilterTemplate2.java:72)
at 
org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.doWork(FilterRecordBatch.java:80)
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
...
{code}

This has been fixed in the latest version of 
[ISOChronology|https://github.com/JodaOrg/joda-time/commit/634066471f2941eddfcca3ed2a62c9d254cabccb].
 What I don't understand is why DRILL-3242 (or patches around that) would make 
this bug appear more frequently.

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout)
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4047) Select with options

2015-11-06 Thread Julien Le Dem (JIRA)
Julien Le Dem created DRILL-4047:


 Summary: Select with options
 Key: DRILL-4047
 URL: https://issues.apache.org/jira/browse/DRILL-4047
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Reporter: Julien Le Dem
Assignee: Julien Le Dem


Add a mechanism to pass parameters down to the StoragePlugin when writing a 
Select statement.
Some discussion here:
http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4047) Select with options

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994841#comment-14994841
 ] 

ASF GitHub Bot commented on DRILL-4047:
---

Github user julienledem commented on a diff in the pull request:

https://github.com/apache/drill/pull/246#discussion_r44204833
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -221,9 +603,46 @@ public void dropView(String viewName) throws 
IOException {
   return viewSet;
 }
 
+private Set rawTableNames() {
+  return newHashSet(
+  transform(tables.keySet(), new 
com.google.common.base.Function() {
+@Override
+public String apply(TableInstance input) {
+  return input.sig.name;
+}
+  }));
+}
+
 @Override
 public Set getTableNames() {
-  return Sets.union(tables.keySet(), getViews());
+  System.out.println("getTableNames");
--- End diff --

todo: cleanup


> Select with options
> ---
>
> Key: DRILL-4047
> URL: https://issues.apache.org/jira/browse/DRILL-4047
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>
> Add a mechanism to pass parameters down to the StoragePlugin when writing a 
> Select statement.
> Some discussion here:
> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E
> http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4047) Select with options

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994834#comment-14994834
 ] 

ASF GitHub Bot commented on DRILL-4047:
---

Github user julienledem commented on a diff in the pull request:

https://github.com/apache/drill/pull/246#discussion_r44204776
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java
 ---
@@ -86,7 +86,8 @@ public FileSystemSchema(String name, SchemaConfig 
schemaConfig) throws IOExcepti
 
 void setPlus(SchemaPlus plusOfThis){
   for(WorkspaceSchema s : schemaMap.values()){
-plusOfThis.add(s.getName(), s);
+SchemaPlus schemaPlus = plusOfThis.add(s.getName(), s);
+//schemaPlus.add(arg0, arg1);
--- End diff --

todo: cleanup


> Select with options
> ---
>
> Key: DRILL-4047
> URL: https://issues.apache.org/jira/browse/DRILL-4047
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>
> Add a mechanism to pass parameters down to the StoragePlugin when writing a 
> Select statement.
> Some discussion here:
> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E
> http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4047) Select with options

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994836#comment-14994836
 ] 

ASF GitHub Bot commented on DRILL-4047:
---

Github user julienledem commented on a diff in the pull request:

https://github.com/apache/drill/pull/246#discussion_r44204801
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FormatCreator.java 
---
@@ -92,10 +118,39 @@
   logger.warn("Failure initializing storage config named '{}' of 
type '{}'.", e.getKey(), e.getValue().getClass().getName(), e1);
 }
   }
-
 }
+this.configuredPlugins = Collections.unmodifiableMap(plugins);
+  }
+
+  /**
+   * @param name the name of the formatplugin instance in the drill config
+   * @return The configured FormatPlugin for this name
+   */
+  FormatPlugin getFormatPluginByName(String name) {
+return configuredPlugins.get(name);
+  }
 
-return plugins;
+  /**
+   * @return all the format plugins from the Drill config
+   */
+  Collection getConfiguredFormatPlugins() {
+return configuredPlugins.values();
   }
 
+  /**
+   * Instantiate a new format plugin instance from the provided config 
object
+   * @param fpconfig the conf for the plugin
+   * @return the newly created instance of a FormatPlugin based on 
provided config
+   */
+  FormatPlugin newFormatPlugin(FormatPluginConfig fpconfig) {
+Constructor c = configConstructors.get(fpconfig.getClass());
+if (c == null) {
+  throw new RuntimeException("Unable to find constructor for storage 
config of type " + fpconfig.getClass().getName());
+}
+try {
+  return (FormatPlugin) c.newInstance(null, context, fsConf, 
storageConfig, fpconfig);
+} catch (InstantiationException | IllegalAccessException | 
IllegalArgumentException | InvocationTargetException e1) {
+  throw new RuntimeException("Failure initializing storage config of 
type " + fpconfig.getClass().getName(), e1);
--- End diff --

todo: exception


> Select with options
> ---
>
> Key: DRILL-4047
> URL: https://issues.apache.org/jira/browse/DRILL-4047
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>
> Add a mechanism to pass parameters down to the StoragePlugin when writing a 
> Select statement.
> Some discussion here:
> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E
> http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4047) Select with options

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994839#comment-14994839
 ] 

ASF GitHub Bot commented on DRILL-4047:
---

Github user julienledem commented on a diff in the pull request:

https://github.com/apache/drill/pull/246#discussion_r44204830
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -221,9 +603,46 @@ public void dropView(String viewName) throws 
IOException {
   return viewSet;
 }
 
+private Set rawTableNames() {
+  return newHashSet(
+  transform(tables.keySet(), new 
com.google.common.base.Function() {
+@Override
+public String apply(TableInstance input) {
+  return input.sig.name;
+}
+  }));
+}
+
 @Override
 public Set getTableNames() {
-  return Sets.union(tables.keySet(), getViews());
+  System.out.println("getTableNames");
+  return Sets.union(rawTableNames(), getViews());
+}
+
+@Override
+public Set getFunctionNames() {
+  System.out.println("getFunctionNames");
+  return rawTableNames();
+}
+
+@Override
+public List getFunctions(String name) {
+  System.out.println("getFunctions(" + name + ")");
--- End diff --

todo: cleanup


> Select with options
> ---
>
> Key: DRILL-4047
> URL: https://issues.apache.org/jira/browse/DRILL-4047
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>
> Add a mechanism to pass parameters down to the StoragePlugin when writing a 
> Select statement.
> Some discussion here:
> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E
> http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4047) Select with options

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994838#comment-14994838
 ] 

ASF GitHub Bot commented on DRILL-4047:
---

Github user julienledem commented on a diff in the pull request:

https://github.com/apache/drill/pull/246#discussion_r44204824
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -221,9 +603,46 @@ public void dropView(String viewName) throws 
IOException {
   return viewSet;
 }
 
+private Set rawTableNames() {
+  return newHashSet(
+  transform(tables.keySet(), new 
com.google.common.base.Function() {
+@Override
+public String apply(TableInstance input) {
+  return input.sig.name;
+}
+  }));
+}
+
 @Override
 public Set getTableNames() {
-  return Sets.union(tables.keySet(), getViews());
+  System.out.println("getTableNames");
+  return Sets.union(rawTableNames(), getViews());
+}
+
+@Override
+public Set getFunctionNames() {
+  System.out.println("getFunctionNames");
+  return rawTableNames();
+}
+
+@Override
+public List getFunctions(String name) {
+  System.out.println("getFunctions(" + name + ")");
+  List sigs = optionExtractor.getTableSignatures(name);
+  System.out.println(sigs);
+//  List sigs = Arrays.asList(
+//  new TableSignature(name, new TableParamDef("delimiter", 
String.class, true)),
+//  new TableSignature(name, new TableParamDef("delimiter", 
Integer.TYPE, true)),
+//  new TableSignature(name, new TableParamDef("foo", 
Integer.TYPE, true), new TableParamDef("bar", Integer.TYPE, true)),
+//  new TableSignature(name, new TableParamDef("foo", 
String.class, true))
+//  );
--- End diff --

todo:cleanup


> Select with options
> ---
>
> Key: DRILL-4047
> URL: https://issues.apache.org/jira/browse/DRILL-4047
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>
> Add a mechanism to pass parameters down to the StoragePlugin when writing a 
> Select statement.
> Some discussion here:
> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E
> http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4047) Select with options

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994842#comment-14994842
 ] 

ASF GitHub Bot commented on DRILL-4047:
---

Github user julienledem commented on a diff in the pull request:

https://github.com/apache/drill/pull/246#discussion_r44204847
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -313,19 +734,29 @@ public String getTypeName() {
   return FileSystemConfig.NAME;
 }
 
+private DrillTable isReadable(FormatMatcher m,  FileSelection 
fileSelection) throws IOException {
+  return m.isReadable(fs, fileSelection, plugin, storageEngineName, 
schemaConfig.getUserName());
+}
+
 @Override
-public DrillTable create(String key) {
+public DrillTable create(TableInstance key) {
   try {
 
-FileSelection fileSelection = FileSelection.create(fs, 
config.getLocation(), key);
+FileSelection fileSelection = FileSelection.create(fs, 
config.getLocation(), key.sig.name);
 if (fileSelection == null) {
   return null;
 }
-
+if (key.sig.params.size() > 0) {
+  FormatPluginConfig fconfig = optionExtractor.eval(key);
+//  TextFormatPlugin.TextFormatConfig fconfig = new 
TextFormatPlugin.TextFormatConfig();
+//  fconfig.extensions = Arrays.asList();
+//  fconfig.fieldDelimiter = ((String)key.params.get(0)).charAt(0);
--- End diff --

todo: cleanup


> Select with options
> ---
>
> Key: DRILL-4047
> URL: https://issues.apache.org/jira/browse/DRILL-4047
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>
> Add a mechanism to pass parameters down to the StoragePlugin when writing a 
> Select statement.
> Some discussion here:
> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E
> http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4047) Select with options

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994835#comment-14994835
 ] 

ASF GitHub Bot commented on DRILL-4047:
---

Github user julienledem commented on a diff in the pull request:

https://github.com/apache/drill/pull/246#discussion_r44204782
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java ---
@@ -86,7 +87,19 @@ public QueryContext(final UserSession session, final 
DrillbitContext drillbitCon
 executionControls = new ExecutionControls(queryOptions, 
drillbitContext.getEndpoint());
 plannerSettings = new PlannerSettings(queryOptions, 
getFunctionRegistry());
 plannerSettings.setNumEndPoints(drillbitContext.getBits().size());
+//boolean caseSensitive = false;
+//CalciteCatalogReader catalogReader =
+//new CalciteCatalogReader(
+//getRootSchema(),
+//caseSensitive,
+//getNewDefaultSchema(),
+//getTypeFactory());
 table = new DrillOperatorTable(getFunctionRegistry());
+//table = new ChainedSqlOperatorTable(asList(
+//  SqlStdOperatorTable.instance(),
+//  catalogReader,
+//  new DrillOperatorTable(getFunctionRegistry())
+//));
--- End diff --

todo: cleanup


> Select with options
> ---
>
> Key: DRILL-4047
> URL: https://issues.apache.org/jira/browse/DRILL-4047
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>
> Add a mechanism to pass parameters down to the StoragePlugin when writing a 
> Select statement.
> Some discussion here:
> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E
> http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994858#comment-14994858
 ] 

Sudheesh Katkam commented on DRILL-4046:


The only thing that is consistent about this regressions is that there are a 
lot of fragments in this state (waiting to get the ISOChronology instance):
{code}
"29c6bb10-3447-65c9-1e7c-985afdcb83ea:frag:7:137" daemon prio=10 
tid=0x7f593290e000 nid=0x2132 waiting for monitor entry [0x7f58998f2000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.joda.time.chrono.ISOChronology.getInstance(ISOChronology.java:104)
- waiting to lock <0x00060d627e90> (a java.util.HashMap)
at org.joda.time.chrono.ISOChronology.getInstance(ISOChronology.java:86)
at org.joda.time.DateTimeUtils.getChronology(DateTimeUtils.java:283)
at 
org.joda.time.format.DateTimeFormatter.selectChronology(DateTimeFormatter.java:942)
at 
org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:851)
at org.joda.time.DateTime.parse(DateTime.java:144)
at 
org.apache.drill.exec.test.generated.FiltererGen87.doEval(FilterTemplate2.java:144)
at 
org.apache.drill.exec.test.generated.FiltererGen87.filterBatchNoSV(FilterTemplate2.java:99)
at 
org.apache.drill.exec.test.generated.FiltererGen87.filterBatch(FilterTemplate2.java:72)
at 
org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.doWork(FilterRecordBatch.java:80)
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
...
{code}

This has been fixed in the latest version of 
[ISOChronology|https://github.com/JodaOrg/joda-time/commit/634066471f2941eddfcca3ed2a62c9d254cabccb].
 What I don't understand is why DRILL-3242 (or patches around that) would make 
this bug appear more frequently.

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout)
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994858#comment-14994858
 ] 

Sudheesh Katkam edited comment on DRILL-4046 at 11/7/15 1:43 AM:
-

The only thing that is consistent about these regressions is that there are a 
lot of fragments in this state (waiting to get the ISOChronology instance):
{code}
"29c6bb10-3447-65c9-1e7c-985afdcb83ea:frag:7:137" daemon prio=10 
tid=0x7f593290e000 nid=0x2132 waiting for monitor entry [0x7f58998f2000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.joda.time.chrono.ISOChronology.getInstance(ISOChronology.java:104)
- waiting to lock <0x00060d627e90> (a java.util.HashMap)
at org.joda.time.chrono.ISOChronology.getInstance(ISOChronology.java:86)
at org.joda.time.DateTimeUtils.getChronology(DateTimeUtils.java:283)
at 
org.joda.time.format.DateTimeFormatter.selectChronology(DateTimeFormatter.java:942)
at 
org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:851)
at org.joda.time.DateTime.parse(DateTime.java:144)
at 
org.apache.drill.exec.test.generated.FiltererGen87.doEval(FilterTemplate2.java:144)
at 
org.apache.drill.exec.test.generated.FiltererGen87.filterBatchNoSV(FilterTemplate2.java:99)
at 
org.apache.drill.exec.test.generated.FiltererGen87.filterBatch(FilterTemplate2.java:72)
at 
org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.doWork(FilterRecordBatch.java:80)
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
...
{code}

This has been fixed in the latest version of 
[ISOChronology|https://github.com/JodaOrg/joda-time/commit/634066471f2941eddfcca3ed2a62c9d254cabccb].
 What I don't understand is why DRILL-3242 (or patches around that) would make 
this bug appear more frequently.


was (Author: sudheeshkatkam):
The only thing that is consistent about this regressions is that there are a 
lot of fragments in this state (waiting to get the ISOChronology instance):
{code}
"29c6bb10-3447-65c9-1e7c-985afdcb83ea:frag:7:137" daemon prio=10 
tid=0x7f593290e000 nid=0x2132 waiting for monitor entry [0x7f58998f2000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.joda.time.chrono.ISOChronology.getInstance(ISOChronology.java:104)
- waiting to lock <0x00060d627e90> (a java.util.HashMap)
at org.joda.time.chrono.ISOChronology.getInstance(ISOChronology.java:86)
at org.joda.time.DateTimeUtils.getChronology(DateTimeUtils.java:283)
at 
org.joda.time.format.DateTimeFormatter.selectChronology(DateTimeFormatter.java:942)
at 
org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:851)
at org.joda.time.DateTime.parse(DateTime.java:144)
at 
org.apache.drill.exec.test.generated.FiltererGen87.doEval(FilterTemplate2.java:144)
at 
org.apache.drill.exec.test.generated.FiltererGen87.filterBatchNoSV(FilterTemplate2.java:99)
at 
org.apache.drill.exec.test.generated.FiltererGen87.filterBatch(FilterTemplate2.java:72)
at 
org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.doWork(FilterRecordBatch.java:80)
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
...
{code}

This has been fixed in the latest version of 
[ISOChronology|https://github.com/JodaOrg/joda-time/commit/634066471f2941eddfcca3ed2a62c9d254cabccb].
 What I don't understand is why DRILL-3242 (or patches around that) would make 
this bug appear more frequently.

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout)
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4048) Parquet reader corrupts dictionary encoded binary columns

2015-11-06 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-4048:


 Summary: Parquet reader corrupts dictionary encoded binary columns
 Key: DRILL-4048
 URL: https://issues.apache.org/jira/browse/DRILL-4048
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.3.0
Reporter: Rahul Challapalli
Priority: Blocker


git.commit.id.abbrev=04c01bd

The below query returns corrupted data (not even showing up here) for binary 
columns
{code}
select * from `lineitem_dic_enc.parquet` limit 1;
+-+++---+-+--+-++---+---+-+---+++-+--+
| l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | l_shipmode  
|l_comment |
+-+++---+-+--+-++---+---+-+---+++-+--+
| 1   | 1552   | 93 | 1 | 17.0| 
24710.35 | 0.04| 0.02   |  |  | 
1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PE  | T   | 
egular courts above the  |
+-+++---+-+--+-++---+---+-+---+++-+--+
{code}

The same query from an older build (git.commit.id.abbrev=839f8da)
{code}
select * from `lineitem_dic_enc.parquet` limit 1;
+-+++---+-+--+-++---+---+-+---+++-+--+
| l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | l_shipmode  
|l_comment |
+-+++---+-+--+-++---+---+-+---+++-+--+
| 1   | 1552   | 93 | 1 | 17.0| 
24710.35 | 0.04| 0.02   | N | O | 
1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PERSON  | TRUCK   
| egular courts above the  |
+-+++---+-+--+-++---+---+-+---+++-+--+
{code}

Below is the output of the parquet-meta command for this dataset
{code}
creator: parquet-mr 

file schema: root 
---
l_orderkey:  REQUIRED INT32 R:0 D:0
l_partkey:   REQUIRED INT32 R:0 D:0
l_suppkey:   REQUIRED INT32 R:0 D:0
l_linenumber:REQUIRED INT32 R:0 D:0
l_quantity:  REQUIRED DOUBLE R:0 D:0
l_extendedprice: REQUIRED DOUBLE R:0 D:0
l_discount:  REQUIRED DOUBLE R:0 D:0
l_tax:   REQUIRED DOUBLE R:0 D:0
l_returnflag:REQUIRED BINARY O:UTF8 R:0 D:0
l_linestatus:REQUIRED BINARY O:UTF8 R:0 D:0
l_shipdate:  REQUIRED INT32 O:DATE R:0 D:0
l_commitdate:REQUIRED INT32 O:DATE R:0 D:0
l_receiptdate:   REQUIRED INT32 O:DATE R:0 D:0
l_shipinstruct:  REQUIRED BINARY O:UTF8 R:0 D:0
l_shipmode:  REQUIRED BINARY O:UTF8 R:0 D:0
l_comment:   REQUIRED BINARY O:UTF8 R:0 D:0

row group 1: RC:60175 TS:3049610 
---
l_orderkey:   INT32 SNAPPY DO:0 FPO:4 SZ:146159/165487/1.13 VC:60175 
ENC:BIT_PACKED,PLAIN_DICTIONARY
l_partkey:INT32 SNAPPY DO:0 FPO:146163 SZ:90867/90918/1.00 VC:60175 
ENC:BIT_PACKED,PLAIN_DICTIONARY
l_suppkey:INT32 SNAPPY DO:0 FPO:237030 SZ:53244/53230/1.00 VC:60175 
ENC:BIT_PACKED,PLAIN_DICTIONARY
l_linenumber: INT32 SNAPPY DO:0 FPO:290274 

[jira] [Updated] (DRILL-4047) Select with options

2015-11-06 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated DRILL-4047:
-
Description: 
Add a mechanism to pass parameters down to the StoragePlugin when writing a 
Select statement.
Some discussion here:
http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E
http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E

  was:
Add a mechanism to pass parameters down to the StoragePlugin when writing a 
Select statement.
Some discussion here:
http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E


> Select with options
> ---
>
> Key: DRILL-4047
> URL: https://issues.apache.org/jira/browse/DRILL-4047
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>
> Add a mechanism to pass parameters down to the StoragePlugin when writing a 
> Select statement.
> Some discussion here:
> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E
> http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4048) Parquet reader corrupts dictionary encoded binary columns

2015-11-06 Thread Rahul Challapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli updated DRILL-4048:
-
Attachment: lineitem_dic_enc.parquet

> Parquet reader corrupts dictionary encoded binary columns
> -
>
> Key: DRILL-4048
> URL: https://issues.apache.org/jira/browse/DRILL-4048
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Priority: Blocker
> Attachments: lineitem_dic_enc.parquet
>
>
> git.commit.id.abbrev=04c01bd
> The below query returns corrupted data (not even showing up here) for binary 
> columns
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | 1   | 1552   | 93 | 1 | 17.0| 
> 24710.35 | 0.04| 0.02   |  |  | 
> 1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PE  | T   | 
> egular courts above the  |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> {code}
> The same query from an older build (git.commit.id.abbrev=839f8da)
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | 1   | 1552   | 93 | 1 | 17.0| 
> 24710.35 | 0.04| 0.02   | N | O | 
> 1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PERSON  | TRUCK 
>   | egular courts above the  |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> {code}
> Below is the output of the parquet-meta command for this dataset
> {code}
> creator: parquet-mr 
> file schema: root 
> ---
> l_orderkey:  REQUIRED INT32 R:0 D:0
> l_partkey:   REQUIRED INT32 R:0 D:0
> l_suppkey:   REQUIRED INT32 R:0 D:0
> l_linenumber:REQUIRED INT32 R:0 D:0
> l_quantity:  REQUIRED DOUBLE R:0 D:0
> l_extendedprice: REQUIRED DOUBLE R:0 D:0
> l_discount:  REQUIRED DOUBLE R:0 D:0
> l_tax:   REQUIRED DOUBLE R:0 D:0
> l_returnflag:REQUIRED BINARY O:UTF8 R:0 D:0
> l_linestatus:REQUIRED BINARY O:UTF8 R:0 D:0
> l_shipdate:  REQUIRED INT32 O:DATE R:0 D:0
> l_commitdate:REQUIRED INT32 O:DATE R:0 D:0
> l_receiptdate:   REQUIRED INT32 O:DATE R:0 D:0
> l_shipinstruct:  REQUIRED BINARY O:UTF8 R:0 D:0
> l_shipmode:  REQUIRED BINARY O:UTF8 R:0 D:0
> l_comment:   REQUIRED BINARY O:UTF8 R:0 D:0
> row group 1: RC:60175 TS:3049610 
> ---
> 

[jira] [Updated] (DRILL-4048) Parquet reader corrupts dictionary encoded binary columns

2015-11-06 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-4048:
--
Assignee: Jason Altekruse

> Parquet reader corrupts dictionary encoded binary columns
> -
>
> Key: DRILL-4048
> URL: https://issues.apache.org/jira/browse/DRILL-4048
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Jason Altekruse
>Priority: Blocker
> Attachments: lineitem_dic_enc.parquet
>
>
> git.commit.id.abbrev=04c01bd
> The below query returns corrupted data (not even showing up here) for binary 
> columns
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | 1   | 1552   | 93 | 1 | 17.0| 
> 24710.35 | 0.04| 0.02   |  |  | 
> 1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PE  | T   | 
> egular courts above the  |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> {code}
> The same query from an older build (git.commit.id.abbrev=839f8da)
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | 1   | 1552   | 93 | 1 | 17.0| 
> 24710.35 | 0.04| 0.02   | N | O | 
> 1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PERSON  | TRUCK 
>   | egular courts above the  |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> {code}
> Below is the output of the parquet-meta command for this dataset
> {code}
> creator: parquet-mr 
> file schema: root 
> ---
> l_orderkey:  REQUIRED INT32 R:0 D:0
> l_partkey:   REQUIRED INT32 R:0 D:0
> l_suppkey:   REQUIRED INT32 R:0 D:0
> l_linenumber:REQUIRED INT32 R:0 D:0
> l_quantity:  REQUIRED DOUBLE R:0 D:0
> l_extendedprice: REQUIRED DOUBLE R:0 D:0
> l_discount:  REQUIRED DOUBLE R:0 D:0
> l_tax:   REQUIRED DOUBLE R:0 D:0
> l_returnflag:REQUIRED BINARY O:UTF8 R:0 D:0
> l_linestatus:REQUIRED BINARY O:UTF8 R:0 D:0
> l_shipdate:  REQUIRED INT32 O:DATE R:0 D:0
> l_commitdate:REQUIRED INT32 O:DATE R:0 D:0
> l_receiptdate:   REQUIRED INT32 O:DATE R:0 D:0
> l_shipinstruct:  REQUIRED BINARY O:UTF8 R:0 D:0
> l_shipmode:  REQUIRED BINARY O:UTF8 R:0 D:0
> l_comment:   REQUIRED BINARY O:UTF8 R:0 D:0
> row group 1: RC:60175 TS:3049610 
> 

[jira] [Commented] (DRILL-3751) Query hang when zookeeper is stopped

2015-11-06 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994737#comment-14994737
 ] 

Jacques Nadeau commented on DRILL-3751:
---

I think the real problem here is that our retry strategy for zookeeper is 
indefinite. 

> Query hang when zookeeper is stopped
> 
>
> Key: DRILL-3751
> URL: https://issues.apache.org/jira/browse/DRILL-3751
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
> Environment: 4 node cluster on CentOS
>Reporter: Khurram Faraaz
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
> Fix For: 1.4.0
>
>
> I see an indefinite hang on sqlline prompt, issue a long running query and 
> then stop zookeeper process when the query is still being executed. Sqlline 
> prompt is never returned and it hangs showing the below stack trace. I am on 
> master.
> Steps to reproduce the problem
> clush -g khurram service mapr-warden stop
> clush -g khurram service mapr-warden start
> Issue long running query from sqlline
> While query is running, stop zookeeper using script.
> To stop zookeeper 
> {code}
> [root@centos-01 bin]# ./zkServer.sh stop
> JMX enabled by default
> Using config: /opt/mapr/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg
> Stopping zookeeper ... STOPPED
> {code}
> Issue below long running query from sqlline
> {code}
> ./sqlline -u "jdbc:drill:schema=dfs.tmp"
> 0: jdbc:drill:schema=dfs.tmp> select * from `twoKeyJsn.json` limit 800;
> ...
> | 7.40907649723E8  | g|
> | 1.12378007695E9  | d|
> 03:03:28.482 [CuratorFramework-0] ERROR org.apache.curator.ConnectionState - 
> Connection timed out for connection string (10.10.100.201:5181) and timeout 
> (5000) / elapsed (5013)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = 
> ConnectionLoss
>   at 
> org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198) 
> [curator-client-2.5.0.jar:na]
>   at 
> org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) 
> [curator-client-2.5.0.jar:na]
>   at 
> org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
>  [curator-client-2.5.0.jar:na]
>   at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:807)
>  [curator-framework-2.5.0.jar:na]
>   at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:793)
>  [curator-framework-2.5.0.jar:na]
>   at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:57)
>  [curator-framework-2.5.0.jar:na]
>   at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:275)
>  [curator-framework-2.5.0.jar:na]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> [na:1.7.0_45]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
>   at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> {code}
> Here is the stack for sqlline process
> {code}
> [root@centos-01 bin]# /usr/java/jdk1.7.0_45/bin/jstack 32136
> 2015-09-05 03:21:52
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.45-b08 mixed mode):
> "Attach Listener" daemon prio=10 tid=0x7f8328003800 nid=0x27f1 waiting on 
> condition [0x]
>java.lang.Thread.State: RUNNABLE
> "CuratorFramework-0-EventThread" daemon prio=10 tid=0x012fd800 
> nid=0x26e1 waiting on condition [0x7f8317c2e000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007e2117798> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:491)
> "CuratorFramework-0-SendThread(centos-01.qa.lab:5181)" daemon prio=10 
> tid=0x01109800 nid=0x26e0 waiting on condition [0x7f8317b2d000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:86)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:937)
>   at 

[jira] [Commented] (DRILL-4047) Select with options

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994844#comment-14994844
 ] 

ASF GitHub Bot commented on DRILL-4047:
---

Github user julienledem commented on a diff in the pull request:

https://github.com/apache/drill/pull/246#discussion_r44204861
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestSelectWithOption.java ---
@@ -0,0 +1,85 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill;
+
+import static java.lang.String.format;
+import static org.apache.drill.TestBuilder.listOf;
+
+import java.io.File;
+import java.io.FileWriter;
+
+import org.junit.Test;
+
+public class TestSelectWithOption extends BaseTestQuery {
+
+//  @Test
+//  public void testBar() throws Exception {
+//test("select dfs.`${WORKING_PATH}/some/path`() from 
cp.`tpch/region.parquet`");
+//  }
+
+  @Test
+  public void testText() throws Exception {
+File input = new File("target/" + this.getClass().getName() + ".csv");
+String query = "select columns from table(dfs.`${WORKING_PATH}/" + 
input.getPath() +
+"`(type => 'TEXT', fieldDelimiter => '%s'))";
+String queryComma = format(query, ",");
+String queryPipe = format(query, "|");
+System.out.println(queryComma);
+System.out.println(queryPipe);
+TestBuilder builderComma = testBuilder()
+.sqlQuery(queryComma)
+.ordered()
+.baselineColumns("columns");
+TestBuilder builderPipe = testBuilder()
+.sqlQuery(queryPipe)
+.ordered()
+.baselineColumns("columns");
+try (FileWriter fw = new FileWriter(input)) {
+//  fw.append("a|b\n");
+  for (int i = 0; i < 3; i++) {
+fw.append("\"b\"|\"" + i + "\"\n");
+builderComma = builderComma.baselineValues(listOf("b\"|\"" + i));
+builderPipe = builderPipe.baselineValues(listOf("b", 
String.valueOf(i)));
+  }
+}
+
+test("select columns from dfs.`${WORKING_PATH}/" + input.getPath() + 
"`");
+
+builderComma.build().run();
+builderPipe.build().run();
+  }
+
+  @Test
+  public void testParquetFailure() throws Exception {
+File input = new File("target/" + this.getClass().getName() + ".csv");
+try (FileWriter fw = new FileWriter(input)) {
+//  fw.append("a|b\n");
+  for (int i = 0; i < 3; i++) {
+fw.append("\"b\"|\"" + i + "\"\n");
+  }
+}
+
+String query = "select columns from table(dfs.`${WORKING_PATH}/" + 
input.getPath() +
+"`(type => 'PARQUET'))";
+System.out.println(query);
+
+test(query);
+
+  }
--- End diff --

todo more granular tests


> Select with options
> ---
>
> Key: DRILL-4047
> URL: https://issues.apache.org/jira/browse/DRILL-4047
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>
> Add a mechanism to pass parameters down to the StoragePlugin when writing a 
> Select statement.
> Some discussion here:
> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E
> http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4047) Select with options

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994840#comment-14994840
 ] 

ASF GitHub Bot commented on DRILL-4047:
---

Github user julienledem commented on a diff in the pull request:

https://github.com/apache/drill/pull/246#discussion_r44204832
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -221,9 +603,46 @@ public void dropView(String viewName) throws 
IOException {
   return viewSet;
 }
 
+private Set rawTableNames() {
+  return newHashSet(
+  transform(tables.keySet(), new 
com.google.common.base.Function() {
+@Override
+public String apply(TableInstance input) {
+  return input.sig.name;
+}
+  }));
+}
+
 @Override
 public Set getTableNames() {
-  return Sets.union(tables.keySet(), getViews());
+  System.out.println("getTableNames");
+  return Sets.union(rawTableNames(), getViews());
+}
+
+@Override
+public Set getFunctionNames() {
+  System.out.println("getFunctionNames");
--- End diff --

todo: cleanup


> Select with options
> ---
>
> Key: DRILL-4047
> URL: https://issues.apache.org/jira/browse/DRILL-4047
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>
> Add a mechanism to pass parameters down to the StoragePlugin when writing a 
> Select statement.
> Some discussion here:
> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E
> http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994686#comment-14994686
 ] 

ASF GitHub Bot commented on DRILL-2288:
---

Github user hnfgns commented on a diff in the pull request:

https://github.com/apache/drill/pull/245#discussion_r44199082
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java ---
@@ -23,60 +23,214 @@
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
 /**
- * A record batch contains a set of field values for a particular range of 
records. In the case of a record batch
- * composed of ValueVectors, ideally a batch fits within L2 cache (~256k 
per core). The set of value vectors do not
- * change unless the next() IterOutcome is a *_NEW_SCHEMA type.
- *
- * A key thing to know is that the Iterator provided by record batch must 
align with the rank positions of the field ids
- * provided utilizing getValueVectorId();
+ * A record batch contains a set of field values for a particular range of
--- End diff --

documentation :+1: 


> ScanBatch violates IterOutcome protocol for zero-row sources [was: missing 
> JDBC metadata (schema) for 0-row results...]
> ---
>
> Key: DRILL-2288
> URL: https://issues.apache.org/jira/browse/DRILL-2288
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Information Schema
>Reporter: Daniel Barclay (Drill)
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.4.0
>
> Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java
>
>
> The ResultSetMetaData object from getMetadata() of a ResultSet is not set up 
> (getColumnCount() returns zero, and trying to access any other metadata 
> throws IndexOutOfBoundsException) for a result set with zero rows, at least 
> for one from DatabaseMetaData.getColumns(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4047) Select with options

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994833#comment-14994833
 ] 

ASF GitHub Bot commented on DRILL-4047:
---

GitHub user julienledem opened a pull request:

https://github.com/apache/drill/pull/246

DRILL-4047: Select with options

This is still work in progress and depends on the following PRs in Calcite:
https://github.com/dremio/calcite/pull/1
https://github.com/apache/calcite/pull/166


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/julienledem/drill select_with_options

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/246.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #246


commit 3e152e087ae71f323d09d8361389cc7d1fa2159a
Author: Julien Le Dem 
Date:   2015-10-22T17:49:34Z

initial TableMacro implementation

commit a963bbc9df79f51114037a59ba1a46cccd7a23cf
Author: Julien Le Dem 
Date:   2015-11-06T23:48:28Z

FormatCreator refactor

commit 495cd55894a554330918f87b83e9ebd4e184e02a
Author: Julien Le Dem 
Date:   2015-11-07T01:24:23Z

DRILL-4047: generic format plugin impl




> Select with options
> ---
>
> Key: DRILL-4047
> URL: https://issues.apache.org/jira/browse/DRILL-4047
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>
> Add a mechanism to pass parameters down to the StoragePlugin when writing a 
> Select statement.
> Some discussion here:
> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E
> http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994878#comment-14994878
 ] 

Sudheesh Katkam commented on DRILL-4046:


Here are some references for future: 
[joda#126|https://github.com/JodaOrg/joda-time/pull/126] and 
[joda#105|https://github.com/JodaOrg/joda-time/issues/105].

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout)
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994943#comment-14994943
 ] 

Jacques Nadeau commented on DRILL-4046:
---

Yeah, it doesn't look like it would cause the problem.

You want to bump Joda to 2.4 which has the fix?

We should separately figure out why the filter in some of these queries is 
using joda instead of our custom casting.

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout)
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4049) Workmanager.StatusThread is not terminated when Drillbit is shutdown

2015-11-06 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-4049:
-

 Summary: Workmanager.StatusThread is not terminated when Drillbit 
is shutdown
 Key: DRILL-4049
 URL: https://issues.apache.org/jira/browse/DRILL-4049
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jacques Nadeau
Assignee: Jacques Nadeau


Causes excessive number of threads to be created in a testing scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3848) Increase timeout time on several tests that time out frequently.

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994581#comment-14994581
 ] 

ASF GitHub Bot commented on DRILL-3848:
---

Github user dsbos closed the pull request at:

https://github.com/apache/drill/pull/174


> Increase timeout time on several tests that time out frequently.
> 
>
> Key: DRILL-3848
> URL: https://issues.apache.org/jira/browse/DRILL-3848
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.2.0
>Reporter: Daniel Barclay (Drill)
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.4.0
>
>
> Increase test timeout time a bit on: 
> - TestTpchDistributedConcurrent
> - TestExampleQueries
> - TestFunctionsQuery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4048) Parquet reader corrupts dictionary encoded binary columns

2015-11-06 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994897#comment-14994897
 ] 

Jason Altekruse commented on DRILL-4048:


Got a chance to take a look, I have a fix for the file you provided. I'm going 
to write another unit test to cover similar cases. Patch should be posted 
within the hour.

> Parquet reader corrupts dictionary encoded binary columns
> -
>
> Key: DRILL-4048
> URL: https://issues.apache.org/jira/browse/DRILL-4048
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Jason Altekruse
>Priority: Blocker
> Attachments: lineitem_dic_enc.parquet
>
>
> git.commit.id.abbrev=04c01bd
> The below query returns corrupted data (not even showing up here) for binary 
> columns
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | 1   | 1552   | 93 | 1 | 17.0| 
> 24710.35 | 0.04| 0.02   |  |  | 
> 1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PE  | T   | 
> egular courts above the  |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> {code}
> The same query from an older build (git.commit.id.abbrev=839f8da)
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | 1   | 1552   | 93 | 1 | 17.0| 
> 24710.35 | 0.04| 0.02   | N | O | 
> 1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PERSON  | TRUCK 
>   | egular courts above the  |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> {code}
> Below is the output of the parquet-meta command for this dataset
> {code}
> creator: parquet-mr 
> file schema: root 
> ---
> l_orderkey:  REQUIRED INT32 R:0 D:0
> l_partkey:   REQUIRED INT32 R:0 D:0
> l_suppkey:   REQUIRED INT32 R:0 D:0
> l_linenumber:REQUIRED INT32 R:0 D:0
> l_quantity:  REQUIRED DOUBLE R:0 D:0
> l_extendedprice: REQUIRED DOUBLE R:0 D:0
> l_discount:  REQUIRED DOUBLE R:0 D:0
> l_tax:   REQUIRED DOUBLE R:0 D:0
> l_returnflag:REQUIRED BINARY O:UTF8 R:0 D:0
> l_linestatus:REQUIRED BINARY O:UTF8 R:0 D:0
> l_shipdate:  REQUIRED INT32 O:DATE R:0 D:0
> l_commitdate:REQUIRED INT32 O:DATE R:0 D:0
> l_receiptdate:   REQUIRED INT32 O:DATE R:0 D:0
> l_shipinstruct:  REQUIRED BINARY O:UTF8 R:0 D:0
> l_shipmode:  REQUIRED BINARY O:UTF8 R:0 D:0
> l_comment:   REQUIRED BINARY O:UTF8 R:0 D:0
> row group 1: RC:60175 TS:3049610 
> 

[jira] [Updated] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-4046:
--
Attachment: DRILL-4046.patch

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: DRILL-4046.patch, profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout)
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4025) Reduce getFileStatus() invocation for Parquet by 1

2015-11-06 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-4025:
--
Fix Version/s: 1.3.0

> Reduce getFileStatus() invocation for Parquet by 1
> --
>
> Key: DRILL-4025
> URL: https://issues.apache.org/jira/browse/DRILL-4025
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Mehant Baid
>Assignee: Mehant Baid
> Fix For: 1.3.0
>
> Attachments: DRILL-4025.patch
>
>
> Currently we invoke getFileStatus() to list all the files under a directory 
> even when we have the metadata cache file. The information is already present 
> in the cache so we don't need to perform this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995026#comment-14995026
 ] 

Sudheesh Katkam commented on DRILL-4046:


LGTM.

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: DRILL-4046.patch, profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout)
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-4046.
---
Resolution: Fixed

Fixed in dde1867

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: DRILL-4046.patch, profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout)
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-4046:
--
Fix Version/s: 1.3.0

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.3.0
>
> Attachments: DRILL-4046.patch, profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout)
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4049) Workmanager.StatusThread is not terminated when Drillbit is shutdown

2015-11-06 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-4049:
--
Attachment: DRILL-4049.patch

> Workmanager.StatusThread is not terminated when Drillbit is shutdown
> 
>
> Key: DRILL-4049
> URL: https://issues.apache.org/jira/browse/DRILL-4049
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: DRILL-4049.patch
>
>
> Causes excessive number of threads to be created in a testing scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4049) Workmanager.StatusThread is not terminated when Drillbit is shutdown

2015-11-06 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-4049.
---
Resolution: Fixed

Fixed in ffe0240

> Workmanager.StatusThread is not terminated when Drillbit is shutdown
> 
>
> Key: DRILL-4049
> URL: https://issues.apache.org/jira/browse/DRILL-4049
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: DRILL-4049.patch
>
>
> Causes excessive number of threads to be created in a testing scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4048) Parquet reader corrupts dictionary encoded binary columns

2015-11-06 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-4048.
---
Resolution: Fixed

Fixed in a5a1aa6

> Parquet reader corrupts dictionary encoded binary columns
> -
>
> Key: DRILL-4048
> URL: https://issues.apache.org/jira/browse/DRILL-4048
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Jason Altekruse
>Priority: Blocker
> Attachments: lineitem_dic_enc.parquet
>
>
> git.commit.id.abbrev=04c01bd
> The below query returns corrupted data (not even showing up here) for binary 
> columns
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | 1   | 1552   | 93 | 1 | 17.0| 
> 24710.35 | 0.04| 0.02   |  |  | 
> 1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PE  | T   | 
> egular courts above the  |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> {code}
> The same query from an older build (git.commit.id.abbrev=839f8da)
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | 1   | 1552   | 93 | 1 | 17.0| 
> 24710.35 | 0.04| 0.02   | N | O | 
> 1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PERSON  | TRUCK 
>   | egular courts above the  |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> {code}
> Below is the output of the parquet-meta command for this dataset
> {code}
> creator: parquet-mr 
> file schema: root 
> ---
> l_orderkey:  REQUIRED INT32 R:0 D:0
> l_partkey:   REQUIRED INT32 R:0 D:0
> l_suppkey:   REQUIRED INT32 R:0 D:0
> l_linenumber:REQUIRED INT32 R:0 D:0
> l_quantity:  REQUIRED DOUBLE R:0 D:0
> l_extendedprice: REQUIRED DOUBLE R:0 D:0
> l_discount:  REQUIRED DOUBLE R:0 D:0
> l_tax:   REQUIRED DOUBLE R:0 D:0
> l_returnflag:REQUIRED BINARY O:UTF8 R:0 D:0
> l_linestatus:REQUIRED BINARY O:UTF8 R:0 D:0
> l_shipdate:  REQUIRED INT32 O:DATE R:0 D:0
> l_commitdate:REQUIRED INT32 O:DATE R:0 D:0
> l_receiptdate:   REQUIRED INT32 O:DATE R:0 D:0
> l_shipinstruct:  REQUIRED BINARY O:UTF8 R:0 D:0
> l_shipmode:  REQUIRED BINARY O:UTF8 R:0 D:0
> l_comment:   REQUIRED BINARY O:UTF8 R:0 D:0
> row group 1: RC:60175 TS:3049610 
> 

[jira] [Commented] (DRILL-4048) Parquet reader corrupts dictionary encoded binary columns

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994996#comment-14994996
 ] 

ASF GitHub Bot commented on DRILL-4048:
---

GitHub user jaltekruse opened a pull request:

https://github.com/apache/drill/pull/247

DRILL-4048: Fix reading required dictionary encoded varbinary data in…

… parquet files after recent update

Fix was small, this update is a little larger than necessary because I was 
hoping to create
a unit test by modifying the one I had added in the earlier patch with the 
version upgrade.
Unfortunately we don't have a good way to generate Parquet files with 
required columns from
unit tests right now. So I just added a smaller subset of the binary file 
that was posted on
the JIRA issue. The refactoring of the earlier test was still useful for 
readability,
so I kept it in.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaltekruse/incubator-drill DRILL-4048

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/247.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #247


commit e344a1fdf08192d6f3d18b09c1e7c3bcc478f518
Author: Jason Altekruse 
Date:   2015-11-07T03:24:28Z

DRILL-4048: Fix reading required dictionary encoded varbinary data in 
parquet files after recent update

Fix was small, this update is a little larger than necessary because I was 
hoping to create
a unit test by modifying the one I had added in the earlier patch with the 
version upgrade.
Unfortunately we don't have a good way to generate Parquet files with 
required columns from
unit tests right now. So I just added a smaller subset of the binary file 
that was posted on
the JIRA issue. The refactoring of the earlier test was still useful for 
readability,
so I kept it in.




> Parquet reader corrupts dictionary encoded binary columns
> -
>
> Key: DRILL-4048
> URL: https://issues.apache.org/jira/browse/DRILL-4048
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Jason Altekruse
>Priority: Blocker
> Attachments: lineitem_dic_enc.parquet
>
>
> git.commit.id.abbrev=04c01bd
> The below query returns corrupted data (not even showing up here) for binary 
> columns
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | 1   | 1552   | 93 | 1 | 17.0| 
> 24710.35 | 0.04| 0.02   |  |  | 
> 1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PE  | T   | 
> egular courts above the  |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> {code}
> The same query from an older build (git.commit.id.abbrev=839f8da)
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> 

[jira] [Commented] (DRILL-4049) Workmanager.StatusThread is not terminated when Drillbit is shutdown

2015-11-06 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995028#comment-14995028
 ] 

Sudheesh Katkam commented on DRILL-4049:


LGTM.

> Workmanager.StatusThread is not terminated when Drillbit is shutdown
> 
>
> Key: DRILL-4049
> URL: https://issues.apache.org/jira/browse/DRILL-4049
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: DRILL-4049.patch
>
>
> Causes excessive number of threads to be created in a testing scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4025) Reduce getFileStatus() invocation for Parquet by 1

2015-11-06 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-4025.
---
Resolution: Fixed

Merged in 1a24233

> Reduce getFileStatus() invocation for Parquet by 1
> --
>
> Key: DRILL-4025
> URL: https://issues.apache.org/jira/browse/DRILL-4025
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Mehant Baid
>Assignee: Mehant Baid
> Fix For: 1.3.0
>
> Attachments: DRILL-4025.patch
>
>
> Currently we invoke getFileStatus() to list all the files under a directory 
> even when we have the metadata cache file. The information is already present 
> in the cache so we don't need to perform this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-4046:
---
Description: 
||commit/query||14||15||18||20||
|[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
|[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
(Time in milliseconds; 900 second timeout)

+ These regressions are not consistent i.e. on multiple runs, some runs do not 
vary from the baseline.
+ TPCH 18 did not regress without timing out (on runs until now).

  was:
||commit/query||14||15||18||20||
|[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
|[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
(Time in milliseconds; 900 second timeout).

+ These regressions are not consistent i.e. on multiple runs, some runs do not 
vary from the baseline.
+ TPCH 18 did not regress without timing out (on runs until now).


> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout)
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4007) json reader treats empty list inconsistently,

2015-11-06 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu updated DRILL-4007:
-
Target Version/s: Future  (was: 1.3.0)

> json reader treats empty list inconsistently,
> -
>
> Key: DRILL-4007
> URL: https://issues.apache.org/jira/browse/DRILL-4007
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
> Attachments: a.json, b.json
>
>
> Depending on where the empty list shows up, the empty list could be treated 
> as empty-list or just a null. 
> Running the following query on the folder with files in the attachment can 
> reproduce the observation:
> {code}
> ++
> |   a|
> ++
> | null   |
> | ["b"]  |
> | [] |
> ++
> {code}
> Note that both first and third records come from
> {code}
> {“a”:[]}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4039) Query fails when non-ascii characters are used in string literals

2015-11-06 Thread Sergio Lob (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993766#comment-14993766
 ] 

Sergio Lob commented on DRILL-4039:
---

Julian, can you please look at the fix for this issue done in HIVE-12207?  
Perhaps that fix can be used here?  

> Query fails when non-ascii characters are used in string literals
> -
>
> Key: DRILL-4039
> URL: https://issues.apache.org/jira/browse/DRILL-4039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.1.0
> Environment: Linux lnxx64r6 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 
> 10 15:42:40 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Sergio Lob
>
> The following query against DRILL returns this error:
> SYSTEM ERROR: CalciteException: Failed to encode  'НАСТРОЕние' in character 
> set 'ISO-8859-1'
>  cc39118a-cde6-4a6e-a1d6-4b6b7e847b8a on maprd
> Query is:
>     SELECT
>    T1.`F01INT`,
>    T1.`F02UCHAR_10`,
>    T1.`F03UVARCHAR_10`
>     FROM
>    DPRV64R6_TRDUNI01T T1
>     WHERE
>    (T1.`F03UVARCHAR_10` =  'НАСТРОЕние')
>     ORDER BY
>    T1.`F01INT`;
> This issue looks similar to jira HIVE-12207.
> Is there a fix or workaround for this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3423) Add New HTTPD format plugin

2015-11-06 Thread Jim Scott (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993914#comment-14993914
 ] 

Jim Scott commented on DRILL-3423:
--

To start fresh on this topic. My understanding of the capabilities of this 
parser grew ten fold while building this implementation. I do feel that it is 
already built in such a way that it will deliver the most flexibility and power 
to the user. That being said, I'm open to discussing the why's and why not's on 
this because I think this is one of the most important file formats we can add 
to drill.

On to the present...
I think we will be best served by using these examples with enough description 
so that we are being very specific and not speaking in generalities.

As of right now, with this logFormat: "%h %t \"%r\" %>s %b \"%{Referer}i\""
this query: select * from dfs.`jimslogfile.log`
with NO user configuration

Drill will yield these fields to the user:
TIME_STAMP:request_receive_time
TIME_DAY:request_receive_time_day
TIME_MONTHNAME:request_receive_time_monthname
TIME_MONTH:request_receive_time_month
TIME_WEEK:request_receive_time_weekofweekyear
TIME_YEAR:request_receive_time_weekyear
TIME_YEAR:request_receive_time_year
TIME_HOUR:request_receive_time_hour
TIME_MINUTE:request_receive_time_minute
TIME_SECOND:request_receive_time_second
TIME_MILLISECOND:request_receive_time_millisecond
TIME_ZONE:request_receive_time_timezone
TIME_EPOCH:request_receive_time_epoch
TIME_DAY:request_receive_time_day_utc
TIME_MONTHNAME:request_receive_time_monthname_utc
TIME_MONTH:request_receive_time_month_utc
TIME_WEEK:request_receive_time_weekofweekyear_utc
TIME_YEAR:request_receive_time_weekyear_utc
TIME_YEAR:request_receive_time_year_utc
TIME_HOUR:request_receive_time_hour_utc
TIME_MINUTE:request_receive_time_minute_utc
TIME_SECOND:request_receive_time_second_utc
TIME_MILLISECOND:request_receive_time_millisecond_utc
IP:connection_client_host
HTTP_FIRSTLINE:request_firstline
HTTP_METHOD:request_firstline_method
HTTP_URI:request_firstline_uri
HTTP_PROTOCOL:request_firstline_uri_protocol
HTTP_USERINFO:request_firstline_uri_userinfo
HTTP_HOST:request_firstline_uri_host
HTTP_PORT:request_firstline_uri_port
HTTP_PATH:request_firstline_uri_path
HTTP_QUERYSTRING:request_firstline_uri_query
STRING:request_firstline_uri_query:map
HTTP_REF:request_firstline_uri_ref
HTTP_PROTOCOL:request_firstline_protocol
HTTP_PROTOCOL_VERSION:request_firstline_protocol_version
HTTP_URI:request_referer
HTTP_PROTOCOL:request_referer_protocol
HTTP_USERINFO:request_referer_userinfo
HTTP_HOST:request_referer_host
HTTP_PORT:request_referer_port
HTTP_PATH:request_referer_path
HTTP_QUERYSTRING:request_referer_query
STRING:request_referer_query:map
HTTP_REF:request_referer_ref
STRING:request_status_last
BYTES:response_body_bytesclf

I believe the benefit of this is that the user will be able to easily refine 
and figure out what they are looking for, which will allow them to then 
optimize the parsing by adding specific fields to the configuration file. This 
could be copy & paste style if we change the plugin configuration be use _ 
instead of . as mentioned in my previous comment. Which I would be good with as 
it would certainly make it easier for the user and will reduce the likelihood 
of configuration mistakes.

By removing the first part of the field name "HTTP_URI:" it would clean up the 
names, but while it is cleaner it doesn't simplify the user experience in my 
opinion. I also don't believe that allowing a user to map those fields to 
different names improves the user experience, and I would actually argue that 
it would detract from it by introducing the possibility of confusion or 
mistakes (we know users mess up configurations all the time and these are 
difficult for beginners to troubleshoot).

With respect to nesting the data in maps, I think the only time we would want 
to do that is when there is a wildcard they are trying to capture. The reason 
being, to me, when I think about parsing a log line in any application, I 
expect to get a flat, tabular type of result set. I wouldn't be expecting 
complex data structures to come back.


> Add New HTTPD format plugin
> ---
>
> Key: DRILL-3423
> URL: https://issues.apache.org/jira/browse/DRILL-3423
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Reporter: Jacques Nadeau
>Assignee: Jim Scott
> Fix For: 1.4.0
>
>
> Add an HTTPD logparser based format plugin.  The author has been kind enough 
> to move the logparser project to be released under the Apache License.  Can 
> find it here:
> 
> nl.basjes.parse.httpdlog
> httpdlog-parser
> 2.0
> 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4026) CTAS Auto Partition on a wide varchar column is giving an IllegalReferenceCountException

2015-11-06 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993943#comment-14993943
 ] 

Jacques Nadeau commented on DRILL-4026:
---

Is this a regression?

> CTAS Auto Partition on a wide varchar column is giving an 
> IllegalReferenceCountException
> 
>
> Key: DRILL-4026
> URL: https://issues.apache.org/jira/browse/DRILL-4026
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - Writer
>Reporter: Rahul Challapalli
> Attachments: abc.tbl
>
>
> git.commit.id.abbrev=bb69f22
> The below query fails
> {code}
> create table vc_part partition by (a) as select cast(columns[0] as 
> varchar(6000)) a, columns[1] b from dfs.`/drill/testdata/abc.tbl`;
> Error: SYSTEM ERROR: IllegalReferenceCountException: refCnt: 0
> Fragment 0:0
> [Error Id: 8bbfcadb-07bb-468c-a772-24c85cecbcf6 on qa-node191.qa.lab:31010]
>   (io.netty.util.IllegalReferenceCountException) refCnt: 0
> io.netty.buffer.AbstractByteBuf.ensureAccessible():1178
> io.netty.buffer.DrillBuf.checkIndexD():184
> io.netty.buffer.DrillBuf.checkBytes():205
> org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare():101
> org.apache.drill.exec.test.generated.ProjectorGen2.doEval():49
> org.apache.drill.exec.test.generated.ProjectorGen2.projectRecords():62
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():173
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():130
> org.apache.drill.exec.record.AbstractRecordBatch.next():156
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():113
> org.apache.drill.exec.record.AbstractRecordBatch.next():103
> org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():91
> org.apache.drill.exec.record.AbstractRecordBatch.next():156
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():113
> org.apache.drill.exec.record.AbstractRecordBatch.next():103
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():130
> org.apache.drill.exec.record.AbstractRecordBatch.next():156
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():119
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745 (state=,code=0)
> {code}
> The data set contains a widestring (5000 chars) as the first column



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-4046:
-

 Summary: Performance regression in some tpch queries with 1.3rc0 
build
 Key: DRILL-4046
 URL: https://issues.apache.org/jira/browse/DRILL-4046
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jacques Nadeau
Assignee: Jacques Nadeau






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-4046:
---
Description: 
||commit/query||14||15||18||20||
|839f8da|10,253|14,642|32,993|21,251|
|e7db9dc|85,061|211,400|900,020|34,066|
(Time in milliseconds; 900 second timeout).

+ These regressions are not consistent i.e. on multiple runs, some runs do not 
vary from the baseline.
+ TPCH 18 did not regress without timing out (on runs until now).

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
>
> ||commit/query||14||15||18||20||
> |839f8da|10,253|14,642|32,993|21,251|
> |e7db9dc|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout).
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-4046:
---
Attachment: profiles.tar.gz

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |839f8da|10,253|14,642|32,993|21,251|
> |e7db9dc|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout).
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4041) Parquet library update causing random "Buffer has negative reference count"

2015-11-06 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994138#comment-14994138
 ] 

Rahul Challapalli commented on DRILL-4041:
--

[~jnadeau] We have seen a variation of this error (which I posted in one of the 
above comments) with the below message on json & hbase.

{code}
oadd.io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
{code}

> Parquet library update causing random "Buffer has negative reference count"
> ---
>
> Key: DRILL-4041
> URL: https://issues.apache.org/jira/browse/DRILL-4041
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
>Priority: Critical
>
> git commit # 39582bd60c9e9b16aba4f099d434e927e7e5
> After the parquet library update commit, we started seeing the below error 
> randomly causing failures in the  Extended Functional Suite.
> {code}
> Failed with exception
> java.lang.IllegalArgumentException: Buffer has negative reference count.
>   at 
> oadd.com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:250)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:259)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:259)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:259)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:239)
>   at 
> oadd.org.apache.drill.exec.vector.BaseDataValueVector.clear(BaseDataValueVector.java:39)
>   at 
> oadd.org.apache.drill.exec.vector.NullableIntVector.clear(NullableIntVector.java:150)
>   at 
> oadd.org.apache.drill.exec.record.SimpleVectorWrapper.clear(SimpleVectorWrapper.java:84)
>   at 
> oadd.org.apache.drill.exec.record.VectorContainer.zeroVectors(VectorContainer.java:312)
>   at 
> oadd.org.apache.drill.exec.record.VectorContainer.clear(VectorContainer.java:296)
>   at 
> oadd.org.apache.drill.exec.record.RecordBatchLoader.clear(RecordBatchLoader.java:183)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.cleanup(DrillResultSetImpl.java:139)
>   at org.apache.drill.jdbc.impl.DrillCursor.close(DrillCursor.java:333)
>   at 
> oadd.net.hydromatic.avatica.AvaticaResultSet.close(AvaticaResultSet.java:110)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.close(DrillResultSetImpl.java:169)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:233)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:89)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4041) Parquet library update causing random "Buffer has negative reference count"

2015-11-06 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994146#comment-14994146
 ] 

Rahul Challapalli commented on DRILL-4041:
--

Just to be more explicit. Below is the environment in which we are running our 
extended functional tests

{code}
No of concurrent queries : 10
All the queries are directly submitted to the same drillbit instead of using 
Zookeeper which randomizes the foreman node selection
{code}

Also I will be scheduling a functional run with a concurrency of 20 to see if 
the frequency of the error increases. I will post my findings once I have them 

> Parquet library update causing random "Buffer has negative reference count"
> ---
>
> Key: DRILL-4041
> URL: https://issues.apache.org/jira/browse/DRILL-4041
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
>Priority: Critical
>
> git commit # 39582bd60c9e9b16aba4f099d434e927e7e5
> After the parquet library update commit, we started seeing the below error 
> randomly causing failures in the  Extended Functional Suite.
> {code}
> Failed with exception
> java.lang.IllegalArgumentException: Buffer has negative reference count.
>   at 
> oadd.com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:250)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:259)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:259)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:259)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:239)
>   at 
> oadd.org.apache.drill.exec.vector.BaseDataValueVector.clear(BaseDataValueVector.java:39)
>   at 
> oadd.org.apache.drill.exec.vector.NullableIntVector.clear(NullableIntVector.java:150)
>   at 
> oadd.org.apache.drill.exec.record.SimpleVectorWrapper.clear(SimpleVectorWrapper.java:84)
>   at 
> oadd.org.apache.drill.exec.record.VectorContainer.zeroVectors(VectorContainer.java:312)
>   at 
> oadd.org.apache.drill.exec.record.VectorContainer.clear(VectorContainer.java:296)
>   at 
> oadd.org.apache.drill.exec.record.RecordBatchLoader.clear(RecordBatchLoader.java:183)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.cleanup(DrillResultSetImpl.java:139)
>   at org.apache.drill.jdbc.impl.DrillCursor.close(DrillCursor.java:333)
>   at 
> oadd.net.hydromatic.avatica.AvaticaResultSet.close(AvaticaResultSet.java:110)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.close(DrillResultSetImpl.java:169)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:233)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:89)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4044) NPE in partition pruning test on (JDK8 + Drill 1.3)

2015-11-06 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994056#comment-14994056
 ] 

Khurram Faraaz commented on DRILL-4044:
---

No the two stack traces are different.

> NPE in partition pruning test on (JDK8 + Drill 1.3)
> ---
>
> Key: DRILL-4044
> URL: https://issues.apache.org/jira/browse/DRILL-4044
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>
> NPE reported in drillbit.log for parquet partition pruning test with Drill 
> 1.3 and JDK 8
> Failing query is from test file 
> Functional/partition_pruning/hive/parquet/dynamic_int_partition/data/parquetSelectPartOrMultipleWithDirIN.q
> {code}
> select l_orderkey, l_partkey, l_quantity, cast(l_shipdate as date) 
> l_shipdate, l_shipinstruct, `year` from 
> hive.dynamic_partitions.lineitem_parquet_partitioned_hive where (`year` IN 
> (1993) and l_orderkey>29600) or (`year` IN (1994) and l_orderkey>29700);
> {code}
> From test output file - 
> dynamicPartitionDirectoryHive-IntPartitionData_parquetSelectPartOrMultipleWithDirIN.output_Fri_Nov_06_00:44:16_UTC_2015
> {code}
> 0   SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: c6f424ce-10b9-48e1-8783-b0dd281b6fc3 on centos-01.qa.lab:31010]
> {code}
> From the drillbit.log
> {code}
> 2015-11-06 00:44:17,222 [29c4081f-628a-fda7-05c5-a70aa9aa148b:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Pruned 7 partitions down to 2
> 2015-11-06 00:44:17,256 [29c4081f-628a-fda7-05c5-a70aa9aa148b:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No partitions were eligible for pruning
> 2015-11-06 00:44:17,323 [29c4081f-628a-fda7-05c5-a70aa9aa148b:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29c4081f-628a-fda7-05c5-a70aa9aa148b:0:0: State change requested 
> AWAITING_ALLOCATION --> FAILED
> 2015-11-06 00:44:17,323 [29c4081f-628a-fda7-05c5-a70aa9aa148b:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29c4081f-628a-fda7-05c5-a70aa9aa148b:0:0: State change requested FAILED --> 
> FINISHED
> 2015-11-06 00:44:17,324 [29c4081f-628a-fda7-05c5-a70aa9aa148b:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: c6f424ce-10b9-48e1-8783-b0dd281b6fc3 on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 0:0
> [Error Id: c6f424ce-10b9-48e1-8783-b0dd281b6fc3 on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:178)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_65]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_65]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-4046:
---
Description: 
||commit/query||14||15||18||20||
|[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
|[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
(Time in milliseconds; 900 second timeout).

+ These regressions are not consistent i.e. on multiple runs, some runs do not 
vary from the baseline.
+ TPCH 18 did not regress without timing out (on runs until now).

  was:
||commit/query||14||15||18||20||
|839f8da|10,253|14,642|32,993|21,251|
|e7db9dc|85,061|211,400|900,020|34,066|
(Time in milliseconds; 900 second timeout).

+ These regressions are not consistent i.e. on multiple runs, some runs do not 
vary from the baseline.
+ TPCH 18 did not regress without timing out (on runs until now).


> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout).
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3480) Some tpcds queries fail with with timeout errors

2015-11-06 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994346#comment-14994346
 ] 

Jacques Nadeau commented on DRILL-3480:
---

Can you confirm whether there are hanging queries in the Drill UI after the set 
of tests are over when we see this failure? Also, are there any inappropriate 
threads in jstack for the nodes once things complete (e.g. fragment threads, 
etc)?

> Some tpcds queries fail with with timeout errors
> 
>
> Key: DRILL-3480
> URL: https://issues.apache.org/jira/browse/DRILL-3480
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Krystal
>Assignee: Hanifi Gunes
>Priority: Critical
> Fix For: 1.4.0
>
>
> Commit Id 9a85b2c
> Some failed queries contained the following errors:
> {code}
> Failed while running cleanup query. Not returning connection to pool.
> java.lang.InterruptedException: sleep interrupted
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:100)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Channel closed /10.10.104.85:59334 <--> /10.10.104.85:31010.
> {code}
> Others failed with error:
> {code}
> org.apache.drill.common.exceptions.UserRemoteException: CONNECTION ERROR: 
> Exceeded timeout (4) while waiting send intermediate work fragments to 
> remote nodes. Sent 8 and only heard response back from 4 nodes.
> [Error Id: b85205b5-3134-4f90-aca8-7d67af04f3ed]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118)
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:111)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32)
>   at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:61)
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:233)
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:205)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> 

[jira] [Updated] (DRILL-4042) Unable to run sqlline in embedded mode on Windows

2015-11-06 Thread Patrick Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wong updated DRILL-4042:

Attachment: DRILL-4042.1.patch.txt

DRILL-4042.1.patch.txt - use newer version of hadoop-winutils

> Unable to run sqlline in embedded mode on Windows
> -
>
> Key: DRILL-4042
> URL: https://issues.apache.org/jira/browse/DRILL-4042
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: 1.3.0
>Reporter: Aditya Kishore
>Assignee: Patrick Wong
>Priority: Blocker
> Attachments: DRILL-4042.1.patch.txt
>
>
> Hadoop binaries ({{hadoop.dll}}, {{winutils.exe}}) bundled with Drill are out 
> of date and needs to be rebuilt with Hadoop 2.7.
> Running sqlline in embedded mode hangs after any command.
> {noformat}
> $ ./bin/sqlline -u jdbc:drill:zk=local
> Nov 05, 2015 3:23:19 PM org.glassfish.jersey.server.ApplicationHandler 
> initialize
> INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 
> 01:25:26...
> apache drill 1.3.0-SNAPSHOT
> "drill baby drill"
> 0: jdbc:drill:zk=local> use dfs;
> Exception in thread "drill-executor-2" java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Ljava/lang/String;JJJI)Ljava/io/FileDescriptor;
> at 
> org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Native 
> Method)
> at 
> org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileOutputStreamWithMode(NativeIO.java:559)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:219)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:209)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:305)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:294)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:326)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:393)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:776)
> at 
> org.apache.drill.exec.store.dfs.DrillFileSystem.create(DrillFileSystem.java:159)
> at 
> org.apache.drill.exec.store.sys.local.FilePStore.put(FilePStore.java:145)
> at 
> org.apache.drill.exec.work.foreman.QueryManager.writeFinalProfile(QueryManager.java:307)
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:749)
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
> at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73)
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894)
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994421#comment-14994421
 ] 

Sudheesh Katkam commented on DRILL-4046:


One more thing.. multiple runs against 
[b327f49|https://github.com/jacques-n/drill/commit/b327f49dc5d3603cc3b31e0ce2a50f2367f7f16f]
 *did not* show any regressions; and there are no major differences between 
e7db9dc and b327f49.

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout)
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4042) Unable to run sqlline in embedded mode on Windows

2015-11-06 Thread Patrick Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wong reassigned DRILL-4042:
---

Assignee: Zelaine Fong  (was: Patrick Wong)

Hello Zelaine,

This patch needs to be reviewed and committed by a Drill committer.

> Unable to run sqlline in embedded mode on Windows
> -
>
> Key: DRILL-4042
> URL: https://issues.apache.org/jira/browse/DRILL-4042
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: 1.3.0
>Reporter: Aditya Kishore
>Assignee: Zelaine Fong
>Priority: Blocker
> Attachments: DRILL-4042.1.patch.txt
>
>
> Hadoop binaries ({{hadoop.dll}}, {{winutils.exe}}) bundled with Drill are out 
> of date and needs to be rebuilt with Hadoop 2.7.
> Running sqlline in embedded mode hangs after any command.
> {noformat}
> $ ./bin/sqlline -u jdbc:drill:zk=local
> Nov 05, 2015 3:23:19 PM org.glassfish.jersey.server.ApplicationHandler 
> initialize
> INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 
> 01:25:26...
> apache drill 1.3.0-SNAPSHOT
> "drill baby drill"
> 0: jdbc:drill:zk=local> use dfs;
> Exception in thread "drill-executor-2" java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Ljava/lang/String;JJJI)Ljava/io/FileDescriptor;
> at 
> org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Native 
> Method)
> at 
> org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileOutputStreamWithMode(NativeIO.java:559)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:219)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:209)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:305)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:294)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:326)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:393)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:776)
> at 
> org.apache.drill.exec.store.dfs.DrillFileSystem.create(DrillFileSystem.java:159)
> at 
> org.apache.drill.exec.store.sys.local.FilePStore.put(FilePStore.java:145)
> at 
> org.apache.drill.exec.work.foreman.QueryManager.writeFinalProfile(QueryManager.java:307)
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:749)
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
> at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73)
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894)
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4041) Parquet library update causing random "Buffer has negative reference count"

2015-11-06 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994464#comment-14994464
 ] 

Rahul Challapalli commented on DRILL-4041:
--

Test 1 : Ran functional tests a couple of times with a concurrency of 20 and 
this issue did not show up
Test 2 : Ran functional tests single time with a concurrency of 10 and this 
issue did not show up as well

> Parquet library update causing random "Buffer has negative reference count"
> ---
>
> Key: DRILL-4041
> URL: https://issues.apache.org/jira/browse/DRILL-4041
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
>Priority: Critical
>
> git commit # 39582bd60c9e9b16aba4f099d434e927e7e5
> After the parquet library update commit, we started seeing the below error 
> randomly causing failures in the  Extended Functional Suite.
> {code}
> Failed with exception
> java.lang.IllegalArgumentException: Buffer has negative reference count.
>   at 
> oadd.com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:250)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:259)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:259)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:259)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:239)
>   at 
> oadd.org.apache.drill.exec.vector.BaseDataValueVector.clear(BaseDataValueVector.java:39)
>   at 
> oadd.org.apache.drill.exec.vector.NullableIntVector.clear(NullableIntVector.java:150)
>   at 
> oadd.org.apache.drill.exec.record.SimpleVectorWrapper.clear(SimpleVectorWrapper.java:84)
>   at 
> oadd.org.apache.drill.exec.record.VectorContainer.zeroVectors(VectorContainer.java:312)
>   at 
> oadd.org.apache.drill.exec.record.VectorContainer.clear(VectorContainer.java:296)
>   at 
> oadd.org.apache.drill.exec.record.RecordBatchLoader.clear(RecordBatchLoader.java:183)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.cleanup(DrillResultSetImpl.java:139)
>   at org.apache.drill.jdbc.impl.DrillCursor.close(DrillCursor.java:333)
>   at 
> oadd.net.hydromatic.avatica.AvaticaResultSet.close(AvaticaResultSet.java:110)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.close(DrillResultSetImpl.java:169)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:233)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:89)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2489) Accessing Connection, Statement, PreparedStatement after they are closed should throw a SQLException

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994542#comment-14994542
 ] 

ASF GitHub Bot commented on DRILL-2489:
---

Github user hnfgns commented on the pull request:

https://github.com/apache/drill/pull/171#issuecomment-154555831
  
+1 for the entire batch of 3 commits.


> Accessing Connection, Statement, PreparedStatement after they are closed 
> should throw a SQLException
> 
>
> Key: DRILL-2489
> URL: https://issues.apache.org/jira/browse/DRILL-2489
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Rahul Challapalli
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.4.0
>
>
> git.commit.id.abbrev=7b4c887
> According to JDBC spec we should throw a SQLException when we access methods 
> on a closed Connection, Statement, or PreparedStatement. Drill is currently 
> not doing it. 
> I can raise multiple JIRA's if the developer wishes to work on them 
> independently



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994553#comment-14994553
 ] 

ASF GitHub Bot commented on DRILL-2288:
---

GitHub user dsbos opened a pull request:

https://github.com/apache/drill/pull/245

DRILL-2288: Fix ScanBatch violation of IterOutcome protocol and downstream 
chain of bugs


Increments:

2288:  Pt. 1 Core:  Added unit test.  
[Drill2288GetColumnsMetadataWhenNoRowsTest, empty.json]

2288:  Pt. 1 Core:  Changed HBase test table #1's # of regions from 1 to 2. 
 [HBaseTestsSuite]

Also added TODO(DRILL-3954) comment about # of regions.

2288:  Pt. 2 Core:  Documented IterOutcome much more clearly.  [RecordBatch]

Also edited some related Javadoc.

2288:  Pt. 2 Hyg.:  Edited doc., added @Override, etc.  
[AbstractRecordBatch, RecordBatch]

Purged unused SetupOutcome.
Added @Override.
Edited comments.
Fix some comments to doc. comments.

2288:  Pt. 3 Core:  Added validation of IterOutcome sequence.  
[IteratorValidatorBatchIterator]

Also:
Renamed internal members for clarity.
Added comments.

2288:  Pt. 4 Core:  Fixed a NONE -> OK_NEW_SCHEMA in ScanBatch.next().  
[ScanBatch]

(With nearby comments.)

2288:  Pt. 4 Hyg.:  Edited comments, reordered, whitespace.  [ScanBatch]

Reordered
Added comments.
Aligned.

2288:  Pt. 4 Core+:  Fixed UnionAllRecordBatch to receive IterOutcome 
sequence right.  (3659)  [UnionAllRecordBatch]

2288:  Pt. 5 Core:  Fixed ScanBatch.Mutator.isNewSchema() to stop spurious 
"new schema" reports (fix short-circuit OR, to call resetting method right).  
[ScanBatch]

2288:  Pt. 5 Hyg.:  Renamed, edited comments, reordered.  [ScanBatch, 
SchemaChangeCallBack, AbstractSingleRecordBatch]

Renamed getSchemaChange -> getSchemaChangedAndReset.
Renamed schemaChange -> schemaChanged.
Added doc. comments.
Aligned.

2288:  Pt. 6 Core:  Avoided dummy Null.IntVec. column in JsonReader when 
not needed (MapWriter.isEmptyMap()).  [JsonReader, 3 vector files]

2288:  Pt. 6 Hyg.:  Edited comments, message.  Fixed message formatting.  
[RecordReader, JSONFormatPlugin, JSONRecordReader, AbstractMapVector, 
JsonReader]

Fixed message formatting.
Edited comments.
Edited message.
Fixed spurious line break.

2288:  Pt. 7 Core:  Added column families in HBaseRecordReader* to avoid 
dummy Null.IntVec. clash.  [HBaseRecordReader]

2288:  Pt. 8 Core.1:  Cleared recordCount in 
OrderedPartitionRecordBatch.innerNext().  [OrderedPartitionRecordBatch]

2288:  Pt. 8 Core.2:  Cleared recordCount in ProjectRecordBatch.innerNext.  
[ProjectRecordBatch]

2288:  Pt. 8 Core.3:  Cleared recordCount in TopNBatch.innerNext.  
[TopNBatch]

2288:  Pt. 9 Core:  Had UnorderedReceiverBatch reset RecordBatchLoader's 
record count.  [UnorderedReceiverBatch, RecordBatchLoader]

2288:  Pt. 9 Hyg.:  Added comments.  [RecordBatchLoader]

2288:  Pt. 10 Core:  Worked around mismatched map child vectors in 
MapVector.getObject().  [MapVector]

2288:  Pt. 11 Core:  Added OK_NEW_SCHEMA schema comparison for HashAgg.  
[HashAggTemplate]

2288:  Pt. 12 Core:  Fixed memory leak in BaseTestQuery's printing.

Fixed bad skipping of RecordBatchLoader.clear(...) and
QueryDataBatch.load(...) for zero-row batches in printResult(...).

Also, dropped suppression of call to
VectorUtil.showVectorAccessibleContent(...) (so zero-row batches are
as visible as others).

2288:  Pt. 13 Core:  Fixed test that used unhandled periods in column alias 
identifiers.

2288:  Misc.:  Added # of rows to showVectorAccessibleContent's output.  
[VectorUtil]

2288:  Misc.:  Added simple/partial toString() [VectorContainer, 
AbstractRecordReader, JSONRecordReader, BaseValueVector, FieldSelection, 
AbstractBaseWriter]

2288:  Misc. Hyg.:  Added doc. comments to VectorContainer.  
[VectorContainer]

2288:  Misc. Hyg.:  Edited comment.  [DrillStringUtils]

2288:  Misc. Hyg.:  Clarified message for unhandled identifier containing 
period.

2288:  Pt. 3 Core Upd.:  Added schema comparison result to logging.  
[IteratorValidatorBatchIterator]

2288:  Pt. 7 Core Upd.:  Handled HBase columns too re NullableIntVectors.  
[HBaseRecordReader, TestTableGenerator, TestHBaseFilterPushDown]

Created map-child vectors for requested columns.
Added unit test method testDummyColumnsAreAvoided, adding new row to test 
table,
updated some row counts.

2288:  Pt. 7 Hyg. Upd.:  Edited comment.  [HBaseRecordReader]

2288:  Pt. 11 Core Upd.:  REVERTED all of bad OK_NEW_SCHEMA schema 
comparison for HashAgg.  [HashAggTemplate]

This 

[jira] [Commented] (DRILL-4042) Unable to run sqlline in embedded mode on Windows

2015-11-06 Thread Aditya Kishore (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994564#comment-14994564
 ] 

Aditya Kishore commented on DRILL-4042:
---

I'll take up the task of reviewing, testing and committing it.

> Unable to run sqlline in embedded mode on Windows
> -
>
> Key: DRILL-4042
> URL: https://issues.apache.org/jira/browse/DRILL-4042
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: 1.3.0
>Reporter: Aditya Kishore
>Assignee: Zelaine Fong
>Priority: Blocker
> Attachments: DRILL-4042.1.patch.txt
>
>
> Hadoop binaries ({{hadoop.dll}}, {{winutils.exe}}) bundled with Drill are out 
> of date and needs to be rebuilt with Hadoop 2.7.
> Running sqlline in embedded mode hangs after any command.
> {noformat}
> $ ./bin/sqlline -u jdbc:drill:zk=local
> Nov 05, 2015 3:23:19 PM org.glassfish.jersey.server.ApplicationHandler 
> initialize
> INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 
> 01:25:26...
> apache drill 1.3.0-SNAPSHOT
> "drill baby drill"
> 0: jdbc:drill:zk=local> use dfs;
> Exception in thread "drill-executor-2" java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Ljava/lang/String;JJJI)Ljava/io/FileDescriptor;
> at 
> org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Native 
> Method)
> at 
> org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileOutputStreamWithMode(NativeIO.java:559)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:219)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:209)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:305)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:294)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:326)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:393)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:776)
> at 
> org.apache.drill.exec.store.dfs.DrillFileSystem.create(DrillFileSystem.java:159)
> at 
> org.apache.drill.exec.store.sys.local.FilePStore.put(FilePStore.java:145)
> at 
> org.apache.drill.exec.work.foreman.QueryManager.writeFinalProfile(QueryManager.java:307)
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:749)
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
> at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73)
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894)
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4042) Unable to run sqlline in embedded mode on Windows

2015-11-06 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-4042:
---

Assignee: Aditya Kishore  (was: Zelaine Fong)

Assigning to Aditya to review/commit.  Thanks, Aditya.

> Unable to run sqlline in embedded mode on Windows
> -
>
> Key: DRILL-4042
> URL: https://issues.apache.org/jira/browse/DRILL-4042
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: 1.3.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Blocker
> Attachments: DRILL-4042.1.patch.txt
>
>
> Hadoop binaries ({{hadoop.dll}}, {{winutils.exe}}) bundled with Drill are out 
> of date and needs to be rebuilt with Hadoop 2.7.
> Running sqlline in embedded mode hangs after any command.
> {noformat}
> $ ./bin/sqlline -u jdbc:drill:zk=local
> Nov 05, 2015 3:23:19 PM org.glassfish.jersey.server.ApplicationHandler 
> initialize
> INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 
> 01:25:26...
> apache drill 1.3.0-SNAPSHOT
> "drill baby drill"
> 0: jdbc:drill:zk=local> use dfs;
> Exception in thread "drill-executor-2" java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Ljava/lang/String;JJJI)Ljava/io/FileDescriptor;
> at 
> org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Native 
> Method)
> at 
> org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileOutputStreamWithMode(NativeIO.java:559)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:219)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:209)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:305)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:294)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:326)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:393)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:776)
> at 
> org.apache.drill.exec.store.dfs.DrillFileSystem.create(DrillFileSystem.java:159)
> at 
> org.apache.drill.exec.store.sys.local.FilePStore.put(FilePStore.java:145)
> at 
> org.apache.drill.exec.work.foreman.QueryManager.writeFinalProfile(QueryManager.java:307)
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:749)
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
> at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73)
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894)
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4046) Performance regression in some tpch queries with 1.3rc0 build

2015-11-06 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994577#comment-14994577
 ] 

Jacques Nadeau commented on DRILL-4046:
---

[~sudheeshkatkam], can you take a look at what you see with this commit:

https://github.com/jacques-n/drill/tree/perf_regression

I've provided a system property which allows enabling/disabling the rpc 
offload. Default is OFF. To enable,
{code}
-Ddrill.enable_rpc_offload=true
{code}

> Performance regression in some tpch queries with 1.3rc0 build
> -
>
> Key: DRILL-4046
> URL: https://issues.apache.org/jira/browse/DRILL-4046
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Attachments: profiles.tar.gz
>
>
> ||commit/query||14||15||18||20||
> |[839f8da|https://github.com/apache/drill/commit/839f8dac2e2d0479a1552701a5274ebe8416fea6]|10,253|14,642|32,993|21,251|
> |[e7db9dc|https://github.com/apache/drill/commit/e7db9dcacbc39c4797de1aa29b119a7428451dea]|85,061|211,400|900,020|34,066|
> (Time in milliseconds; 900 second timeout)
> + These regressions are not consistent i.e. on multiple runs, some runs do 
> not vary from the baseline.
> + TPCH 18 did not regress without timing out (on runs until now).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)