[jira] [Closed] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2019-01-30 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-6517.
-

> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:73)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:79)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:242)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:218)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  

[jira] [Comment Edited] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2019-01-30 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756773#comment-16756773
 ] 

Robert Hou edited comment on DRILL-6517 at 1/31/19 1:38 AM:


I am unable to reproduce this problem with sf1. I ran the query for 2 hours and 
12 hours, and then successfully canceled the query. I spoke with Khurram and 
added "-ea" to DRILL_JAVA_OPTS. I also added "alter system set 
`drill.exec.hashjoin.fallback.enabled` = true;" because the query was running 
out of memory. I am able to cancel the query. I am running Drill 1.14, commit 
35a1ae23c9b280b9e73cb0f6f01808c996515454. The commit message is "NPE for nested 
EAND scenario.".  The query can be canceled with Drill 1.15.


was (Author: rhou):
 am unable to reproduce this problem with sf1. I ran the query for 2 hours and 
12 hours, and then successfully canceled the query. I spoke with Khurram and 
added "-ea" to DRILL_JAVA_OPTS. I also added "alter system set 
`drill.exec.hashjoin.fallback.enabled` = true;" because the query was running 
out of memory. I am able to cancel the query. I am running Drill 1.14, commit 
35a1ae23c9b280b9e73cb0f6f01808c996515454. The commit message is "NPE for nested 
EAND scenario.".  The query can be canceled with Drill 1.15.

> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> 

[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2019-01-30 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756773#comment-16756773
 ] 

Robert Hou commented on DRILL-6517:
---

 am unable to reproduce this problem with sf1. I ran the query for 2 hours and 
12 hours, and then successfully canceled the query. I spoke with Khurram and 
added "-ea" to DRILL_JAVA_OPTS. I also added "alter system set 
`drill.exec.hashjoin.fallback.enabled` = true;" because the query was running 
out of memory. I am able to cancel the query. I am running Drill 1.14, commit 
35a1ae23c9b280b9e73cb0f6f01808c996515454. The commit message is "NPE for nested 
EAND scenario.".

> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:73)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:79)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> 

[jira] [Comment Edited] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2019-01-30 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756773#comment-16756773
 ] 

Robert Hou edited comment on DRILL-6517 at 1/31/19 1:38 AM:


 am unable to reproduce this problem with sf1. I ran the query for 2 hours and 
12 hours, and then successfully canceled the query. I spoke with Khurram and 
added "-ea" to DRILL_JAVA_OPTS. I also added "alter system set 
`drill.exec.hashjoin.fallback.enabled` = true;" because the query was running 
out of memory. I am able to cancel the query. I am running Drill 1.14, commit 
35a1ae23c9b280b9e73cb0f6f01808c996515454. The commit message is "NPE for nested 
EAND scenario.".  The query can be canceled with Drill 1.15.


was (Author: rhou):
 am unable to reproduce this problem with sf1. I ran the query for 2 hours and 
12 hours, and then successfully canceled the query. I spoke with Khurram and 
added "-ea" to DRILL_JAVA_OPTS. I also added "alter system set 
`drill.exec.hashjoin.fallback.enabled` = true;" because the query was running 
out of memory. I am able to cancel the query. I am running Drill 1.14, commit 
35a1ae23c9b280b9e73cb0f6f01808c996515454. The commit message is "NPE for nested 
EAND scenario.".

> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690)
>  

[jira] [Commented] (DRILL-6911) Documentation issue - Hadoop core-site.xml is not supported by Drill to read S3 credentials

2019-01-30 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756663#comment-16756663
 ] 

Bridget Bevens commented on DRILL-6911:
---

Thanks, [~denysord88]. I removed the note. 

Best,
Bridget

> Documentation issue - Hadoop core-site.xml is not supported by Drill to read 
> S3 credentials
> ---
>
> Key: DRILL-6911
> URL: https://issues.apache.org/jira/browse/DRILL-6911
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Denys Ordynskiy
>Assignee: Bridget Bevens
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.16.0
>
>
> In the Drill S3 documentation https://drill.apache.org/docs/s3-storage-plugin/
> Section "Providing AWS Credentials" describing 3 ways to setup AWS S3 
> credentials in Drill:
> - storage plugin;
> - Drill-specific core-site.xml;
> - existing S3 configuration for Hadoop.
> Third item is not supported by Drill. Hadoop core-site.xml config file may 
> contains S3 credentials, but Drill doesn't read any S3 parameters directly 
> from Hadoop config file.
> Third item 
> {code:java}
> In a Hadoop environment, you can use the existing S3 configuration for 
> Hadoop. The AWS credentials should already be defined. All you need to do is 
> configure the S3 storage plugin.
> {code}
> should be removed from the document 
> https://drill.apache.org/docs/s3-storage-plugin/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6911) Documentation issue - Hadoop core-site.xml is not supported by Drill to read S3 credentials

2019-01-30 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-6911:
--
Labels: doc-complete  (was: doc-impacting)

> Documentation issue - Hadoop core-site.xml is not supported by Drill to read 
> S3 credentials
> ---
>
> Key: DRILL-6911
> URL: https://issues.apache.org/jira/browse/DRILL-6911
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Denys Ordynskiy
>Assignee: Bridget Bevens
>Priority: Major
>  Labels: doc-complete
> Fix For: 1.16.0
>
>
> In the Drill S3 documentation https://drill.apache.org/docs/s3-storage-plugin/
> Section "Providing AWS Credentials" describing 3 ways to setup AWS S3 
> credentials in Drill:
> - storage plugin;
> - Drill-specific core-site.xml;
> - existing S3 configuration for Hadoop.
> Third item is not supported by Drill. Hadoop core-site.xml config file may 
> contains S3 credentials, but Drill doesn't read any S3 parameters directly 
> from Hadoop config file.
> Third item 
> {code:java}
> In a Hadoop environment, you can use the existing S3 configuration for 
> Hadoop. The AWS credentials should already be defined. All you need to do is 
> configure the S3 storage plugin.
> {code}
> should be removed from the document 
> https://drill.apache.org/docs/s3-storage-plugin/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7018) Drill Query (when store.parquet.reader.int96_as_timestamp=true) on Parquet File fails with Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 37

2019-01-30 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-7018:
-
Remaining Estimate: 0h  (was: 24h)
 Original Estimate: 0h  (was: 24h)
  Reviewer: Boaz Ben-Zvi

> Drill Query (when store.parquet.reader.int96_as_timestamp=true) on Parquet 
> File fails with Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 
> 0, writerIndex: 372 (expected: 0 <= readerIndex <= writerIndex <= 
> capacity(256))
> 
>
> Key: DRILL-7018
> URL: https://issues.apache.org/jira/browse/DRILL-7018
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.14.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> alter system set `store.parquet.reader.int96_as_timestamp`= true
> run query witch projects a column of type Parquet INT96 timestamp with 31 
> nulls
> The following exception will be thrown:
> java.lang.IndexOutOfBoundsException: readerIndex: 0, writerIndex: 372 
> (expected: 0 <= readerIndex <= writerIndex <= capacity(256))
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-30 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6997:
---
Labels: ready-to-commit  (was: )

> Semijoin is changing the join ordering for some tpcds queries.
> --
>
> Key: DRILL-6997
> URL: https://issues.apache.org/jira/browse/DRILL-6997
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
> Attachments: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill, 
> 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill
>
>
> TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join 
> disabled at scale factor 100. It runs 100% slower at scale factor 1000. This 
> issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. 
> DRILL-6798: Planner changes to support semi-join.
> {code:java}
> with ws_wh as
>  (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
>  from web_sales ws1,web_sales ws2
>  where ws1.ws_order_number = ws2.ws_order_number
>  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
>  [_LIMITA] select [_LIMITB]
>  count(distinct ws_order_number) as "order count"
>  ,sum(ws_ext_ship_cost) as "total shipping cost"
>  ,sum(ws_net_profit) as "total net profit"
>  from
>  web_sales ws1
>  ,date_dim
>  ,customer_address
>  ,web_site
>  where
>  d_date between '[YEAR]-[MONTH]-01' and
>  (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
>  and ws1.ws_ship_date_sk = d_date_sk
>  and ws1.ws_ship_addr_sk = ca_address_sk
>  and ca_state = '[STATE]'
>  and ws1.ws_web_site_sk = web_site_sk
>  and web_company_name = 'pri'
>  and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  and ws1.ws_order_number in (select wr_order_number
>  from web_returns,ws_wh
>  where wr_order_number = ws_wh.ws_order_number)
>  order by count(distinct ws_order_number)
>  [_LIMITC];
> {code}
>  I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has 
> semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join 
> disabled. Both are executed with commit id 
> 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100.
> The plan with semi-join enabled has moved the first hash join:
> and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  It used to be on the build side of the first HJ on the left hand side 
> (04-05). It is now on the build side of the fourth HJ on the left hand side 
> (01-13).
> The plan with semi-join enabled has a hash_partition_sender (operator 05-00) 
> that takes 10 seconds to execute. But all the fragments take about the same 
> amount of time.
> The plan with semi-join enabled has two HJ that process 1B rows while the 
> plan with semi-joins disabled has one HJ that processes 1B rows.
> The plan with semi-join enabled has several senders and receivers that wait 
> more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 
> 17-00). When disabled, there is no operator waiting more than 10 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7020) big varchar doesn't work with extractHeader=true

2019-01-30 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756537#comment-16756537
 ] 

Paul Rogers commented on DRILL-7020:


The size limitation is hard-coded into the "complaint" text reader, as you 
noted. I'm not sure the limit is necessary. Drill uses a 4-byte offset vector 
to track VARCHAR values within a VARCHAR vector. Might be as easy as removing 
the size check.

> big varchar doesn't work with extractHeader=true
> 
>
> Key: DRILL-7020
> URL: https://issues.apache.org/jira/browse/DRILL-7020
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text  CSV
>Affects Versions: 1.15.0
>Reporter: benj
>Priority: Major
>
> with a TEST file of csv type like
> {code:java}
> col1,col2
> w,x
> ...y...,z
> {code}
> where ...y... is > 65536 characters string (let say 66000 for example)
> SELECT with +*extractHeader=false*+ are OK
> {code:java}
> SELECT * FROM TABLE(tmp.`TEST`(type => 'text', fieldDelimiter => ',', 
> extractHeader => false));
>     col1  | col2
> +-+--
> | w       | x
> | ...y... | z
> {code}
> But SELECT with +*extractHeader=true*+ gives an error
> {code:java}
> SELECT * FROM TABLE(tmp.`TEST`(type => 'text', fieldDelimiter => ',', 
> extractHeader => true));
> Error: UNSUPPORTED_OPERATION ERROR: Trying to write something big in a column
> columnIndex 1
> Limit 65536
> Fragment 0:0
> {code}
> Note that is possible to use extractHeader=false with skipFirstLine=true but 
> in this case it's not possible to automatically get columns names.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6726) Drill fails to query views created before DRILL-6492 when impersonation is enabled

2019-01-30 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-6726.
-

> Drill fails to query views created before DRILL-6492 when impersonation is 
> enabled
> --
>
> Key: DRILL-6726
> URL: https://issues.apache.org/jira/browse/DRILL-6726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
> Attachments: student
>
>
> Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an 
> existing view was created before (DRILL-6492) was committed, and this view 
> references a file that includes a schema which has upper case letters, the 
> view needs to be rebuilt.  There may be variations on this issue that I have 
> not seen.
> To reproduce this problem, create a dfs workspace like this:
> {noformat}
> "drillTestDirP1": {
>   "location": "/drill/testdata/p1tests",
>   "writable": true,
>   "defaultInputFormat": "parquet",
>   "allowAccessOutsideWorkspace": false
> },
> {noformat}
> Use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this 
> command:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute 
> this query:
> {noformat}
> select * from student_test_v;
> {noformat}
> Drill will return an exception:
> {noformat}
> Error: VALIDATION ERROR: Failure while attempting to expand view. Requested 
> schema drillTestDirP1 not available in schema dfs.
> View Context dfs, drillTestDirP1
> View SQL SELECT *
> FROM `dfs.drillTestDirP1`.`student`
> [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> I have attached the student parquet file I used.
> This is what the .view.drill file looks like:
> {noformat}
> {
>   "name" : "student_test_v",
>   "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`",
>   "fields" : [ {
> "name" : "**",
> "type" : "DYNAMIC_STAR",
> "isNullable" : true
>   } ],
>   "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ]
> }
> {noformat}
> This means that users may not be able to access views that they have created 
> using previous versions of Drill.  We should maintain backwards 
> compatibiliity where possible.
> As work-around, these views can be re-created.  It would be helpful to users 
> if the error message explains that these views need to be re-created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6726) Drill fails to query views created before DRILL-6492 when impersonation is enabled

2019-01-30 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756501#comment-16756501
 ] 

Robert Hou commented on DRILL-6726:
---

I have encountered another problem related to this one.  If I run Drill 1.15, 
and then I run Drill 1.14, Drill 1.14 cannot access schemas using mixed-case 
(have upper case letters).  It can access the schema if it uses lower case 
letters.  For example, if the schema used to be called "drillTestDir", Drill 
1.14 must use "drilltestdir" in order to use it.  This means that scripts that 
use "drillTestDir" can break.

This may not be a major issue now, but sometimes users can try a new version of 
Drill, and if they run into problems, they can revert to the older version of 
Drill.  We know one user who tried Drill 1.14 and encountered some problems and 
went back to Drill 1.13.  We should keep this in mind in future releases.

> Drill fails to query views created before DRILL-6492 when impersonation is 
> enabled
> --
>
> Key: DRILL-6726
> URL: https://issues.apache.org/jira/browse/DRILL-6726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Robert Hou
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
> Attachments: student
>
>
> Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If an 
> existing view was created before (DRILL-6492) was committed, and this view 
> references a file that includes a schema which has upper case letters, the 
> view needs to be rebuilt.  There may be variations on this issue that I have 
> not seen.
> To reproduce this problem, create a dfs workspace like this:
> {noformat}
> "drillTestDirP1": {
>   "location": "/drill/testdata/p1tests",
>   "writable": true,
>   "defaultInputFormat": "parquet",
>   "allowAccessOutsideWorkspace": false
> },
> {noformat}
> Use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute this 
> command:
> {noformat}
> create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * 
> from `dfs.drillTestDirP1`.student;
> {noformat}
> Then use Drill commit ddb35ce71837376c7caef28c25327ba556bb32f2 and execute 
> this query:
> {noformat}
> select * from student_test_v;
> {noformat}
> Drill will return an exception:
> {noformat}
> Error: VALIDATION ERROR: Failure while attempting to expand view. Requested 
> schema drillTestDirP1 not available in schema dfs.
> View Context dfs, drillTestDirP1
> View SQL SELECT *
> FROM `dfs.drillTestDirP1`.`student`
> [Error Id: 3f4594ee-b503-40db-8845-474b0ecb5feb on qa-node211.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> I have attached the student parquet file I used.
> This is what the .view.drill file looks like:
> {noformat}
> {
>   "name" : "student_test_v",
>   "sql" : "SELECT *\nFROM `dfs.drillTestDirP1`.`student`",
>   "fields" : [ {
> "name" : "**",
> "type" : "DYNAMIC_STAR",
> "isNullable" : true
>   } ],
>   "workspaceSchemaPath" : [ "dfs", "drillTestDirP1" ]
> }
> {noformat}
> This means that users may not be able to access views that they have created 
> using previous versions of Drill.  We should maintain backwards 
> compatibiliity where possible.
> As work-around, these views can be re-created.  It would be helpful to users 
> if the error message explains that these views need to be re-created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-30 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6997:
---
Labels:   (was: ready-to-commit)

> Semijoin is changing the join ordering for some tpcds queries.
> --
>
> Key: DRILL-6997
> URL: https://issues.apache.org/jira/browse/DRILL-6997
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill, 
> 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill
>
>
> TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join 
> disabled at scale factor 100. It runs 100% slower at scale factor 1000. This 
> issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. 
> DRILL-6798: Planner changes to support semi-join.
> {code:java}
> with ws_wh as
>  (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
>  from web_sales ws1,web_sales ws2
>  where ws1.ws_order_number = ws2.ws_order_number
>  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
>  [_LIMITA] select [_LIMITB]
>  count(distinct ws_order_number) as "order count"
>  ,sum(ws_ext_ship_cost) as "total shipping cost"
>  ,sum(ws_net_profit) as "total net profit"
>  from
>  web_sales ws1
>  ,date_dim
>  ,customer_address
>  ,web_site
>  where
>  d_date between '[YEAR]-[MONTH]-01' and
>  (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
>  and ws1.ws_ship_date_sk = d_date_sk
>  and ws1.ws_ship_addr_sk = ca_address_sk
>  and ca_state = '[STATE]'
>  and ws1.ws_web_site_sk = web_site_sk
>  and web_company_name = 'pri'
>  and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  and ws1.ws_order_number in (select wr_order_number
>  from web_returns,ws_wh
>  where wr_order_number = ws_wh.ws_order_number)
>  order by count(distinct ws_order_number)
>  [_LIMITC];
> {code}
>  I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has 
> semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join 
> disabled. Both are executed with commit id 
> 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100.
> The plan with semi-join enabled has moved the first hash join:
> and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  It used to be on the build side of the first HJ on the left hand side 
> (04-05). It is now on the build side of the fourth HJ on the left hand side 
> (01-13).
> The plan with semi-join enabled has a hash_partition_sender (operator 05-00) 
> that takes 10 seconds to execute. But all the fragments take about the same 
> amount of time.
> The plan with semi-join enabled has two HJ that process 1B rows while the 
> plan with semi-joins disabled has one HJ that processes 1B rows.
> The plan with semi-join enabled has several senders and receivers that wait 
> more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 
> 17-00). When disabled, there is no operator waiting more than 10 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2019-01-30 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-6709.
-

> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2019-01-30 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756486#comment-16756486
 ] 

Robert Hou commented on DRILL-6709:
---

I have verified this.

> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7020) big varchar doesn't work with extractHeader=true

2019-01-30 Thread benj (JIRA)
benj created DRILL-7020:
---

 Summary: big varchar doesn't work with extractHeader=true
 Key: DRILL-7020
 URL: https://issues.apache.org/jira/browse/DRILL-7020
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Text  CSV
Affects Versions: 1.15.0
Reporter: benj


with a TEST file of csv type like
{code:java}
col1,col2
w,x
...y...,z
{code}
where ...y... is > 65536 characters string (let say 66000 for example)

SELECT with +*extractHeader=false*+ are OK
{code:java}
SELECT * FROM TABLE(tmp.`TEST`(type => 'text', fieldDelimiter => ',', 
extractHeader => false));
    col1  | col2
+-+--
| w       | x
| ...y... | z
{code}
But SELECT with +*extractHeader=true*+ gives an error
{code:java}
SELECT * FROM TABLE(tmp.`TEST`(type => 'text', fieldDelimiter => ',', 
extractHeader => true));
Error: UNSUPPORTED_OPERATION ERROR: Trying to write something big in a column
columnIndex 1
Limit 65536
Fragment 0:0
{code}

Note that is possible to use extractHeader=false with skipFirstLine=true but in 
this case it's not possible to automatically get columns names.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7019) Add check for redundant imports

2019-01-30 Thread Volodymyr Vysotskyi (JIRA)
Volodymyr Vysotskyi created DRILL-7019:
--

 Summary: Add check for redundant imports
 Key: DRILL-7019
 URL: https://issues.apache.org/jira/browse/DRILL-7019
 Project: Apache Drill
  Issue Type: Task
  Components: Tools, Build  Test
Affects Versions: 1.15.0
Reporter: Volodymyr Vysotskyi
Assignee: Volodymyr Vysotskyi
 Fix For: 1.16.0


Currently, used only {{UnusedImports}} check which does not prevents duplicate 
imports or imports from the same package.

The goal of this Jira is to add {{RedundantImport}} check and fix checkstyle 
errors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7018) Drill Query (when store.parquet.reader.int96_as_timestamp=true) on Parquet File fails with Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 37

2019-01-30 Thread salim achouche (JIRA)
salim achouche created DRILL-7018:
-

 Summary: Drill Query (when 
store.parquet.reader.int96_as_timestamp=true) on Parquet File fails with Error: 
SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 372 
(expected: 0 <= readerIndex <= writerIndex <= capacity(256))
 Key: DRILL-7018
 URL: https://issues.apache.org/jira/browse/DRILL-7018
 Project: Apache Drill
  Issue Type: Improvement
Reporter: salim achouche






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7018) Drill Query (when store.parquet.reader.int96_as_timestamp=true) on Parquet File fails with Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 3

2019-01-30 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche reassigned DRILL-7018:
-

  Assignee: salim achouche
 Affects Version/s: 1.14.0
Remaining Estimate: 24h
 Original Estimate: 24h
 Fix Version/s: 1.16.0
   Description: 
alter system set `store.parquet.reader.int96_as_timestamp`= true

run query witch projects a column of type Parquet INT96 timestamp with 31 nulls

The following exception will be thrown:

java.lang.IndexOutOfBoundsException: readerIndex: 0, writerIndex: 372 
(expected: 0 <= readerIndex <= writerIndex <= capacity(256))

 
   Component/s: Storage - Parquet

> Drill Query (when store.parquet.reader.int96_as_timestamp=true) on Parquet 
> File fails with Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 
> 0, writerIndex: 372 (expected: 0 <= readerIndex <= writerIndex <= 
> capacity(256))
> 
>
> Key: DRILL-7018
> URL: https://issues.apache.org/jira/browse/DRILL-7018
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.14.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> alter system set `store.parquet.reader.int96_as_timestamp`= true
> run query witch projects a column of type Parquet INT96 timestamp with 31 
> nulls
> The following exception will be thrown:
> java.lang.IndexOutOfBoundsException: readerIndex: 0, writerIndex: 372 
> (expected: 0 <= readerIndex <= writerIndex <= capacity(256))
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-7017) lz4 codec for (un)compression

2019-01-30 Thread Volodymyr Vysotskyi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756328#comment-16756328
 ] 

Volodymyr Vysotskyi edited comment on DRILL-7017 at 1/30/19 5:02 PM:
-

Please see example of querying compressed files here: 
[https://drill.apache.org/docs/querying-plain-text-files/#querying-compressed-files].

{{hadoop-common}} already contains lz4 compressors/decompressors. So it is 
possible that this compression is already supported. Not sure about the full 
list of codecs, but I think it may be extended by specifying additional codecs 
in {{io.compression.codecs}} in hadoop conf file (perhaps in {{core-site.xml}}).


was (Author: vvysotskyi):
Please see examples of usage of querying compressed files here: 
[https://drill.apache.org/docs/querying-plain-text-files/#querying-compressed-files].


 {{hadoop-common}} already contains lz4 compressors/decompressors. So it is 
possible that this compression is already supported. Not sure about the full 
list of codecs, but I think it may be extended by specifying additional codecs 
in {{io.compression.codecs}} in hadoop conf file (perhaps in {{core-site.xml}}).

> lz4 codec for (un)compression
> -
>
> Key: DRILL-7017
> URL: https://issues.apache.org/jira/browse/DRILL-7017
> Project: Apache Drill
>  Issue Type: Wish
>  Components: Storage - Text  CSV
>Affects Versions: 1.15.0
>Reporter: benj
>Priority: Major
>
> I didn't find in the documentation what compression formats are supported. 
> But as it's possible to use drill on compressed file, like
> {code:java}
> SELECT * FROM tmp.`myfile.csv.gz`;
> {code}
> It will be useful to have the possibility to use this functionality for lz4 
> file ([https://github.com/lz4/lz4])
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7017) lz4 codec for (un)compression

2019-01-30 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7017:
---
Description: 
I didn't find in the documentation what compression formats are supported. But 
as it's possible to use drill on compressed file, like
{code:java}
SELECT * FROM tmp.`myfile.csv.gz`;
{code}
It will be useful to have the possibility to use this functionality for lz4 
file ([https://github.com/lz4/lz4])

 

  was:
I didn't find in the documentation what compression formats are supported. But 
as it's possible to use drill on compressed file, like
{code:java}
SELECT * FROM tmp.`myfile.csv.gz`;
{code}
It will be useful to have the possibility to use this functionality for lz4 
file ([https://github.com/lz4/lz4)]

 


> lz4 codec for (un)compression
> -
>
> Key: DRILL-7017
> URL: https://issues.apache.org/jira/browse/DRILL-7017
> Project: Apache Drill
>  Issue Type: Wish
>  Components: Storage - Text  CSV
>Affects Versions: 1.15.0
>Reporter: benj
>Priority: Major
>
> I didn't find in the documentation what compression formats are supported. 
> But as it's possible to use drill on compressed file, like
> {code:java}
> SELECT * FROM tmp.`myfile.csv.gz`;
> {code}
> It will be useful to have the possibility to use this functionality for lz4 
> file ([https://github.com/lz4/lz4])
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7017) lz4 codec for (un)compression

2019-01-30 Thread Volodymyr Vysotskyi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756328#comment-16756328
 ] 

Volodymyr Vysotskyi commented on DRILL-7017:


Please see examples of usage of querying compressed files here: 
[https://drill.apache.org/docs/querying-plain-text-files/#querying-compressed-files].


 {{hadoop-common}} already contains lz4 compressors/decompressors. So it is 
possible that this compression is already supported. Not sure about the full 
list of codecs, but I think it may be extended by specifying additional codecs 
in {{io.compression.codecs}} in hadoop conf file (perhaps in {{core-site.xml}}).

> lz4 codec for (un)compression
> -
>
> Key: DRILL-7017
> URL: https://issues.apache.org/jira/browse/DRILL-7017
> Project: Apache Drill
>  Issue Type: Wish
>  Components: Storage - Text  CSV
>Affects Versions: 1.15.0
>Reporter: benj
>Priority: Major
>
> I didn't find in the documentation what compression formats are supported. 
> But as it's possible to use drill on compressed file, like
> {code:java}
> SELECT * FROM tmp.`myfile.csv.gz`;
> {code}
> It will be useful to have the possibility to use this functionality for lz4 
> file ([https://github.com/lz4/lz4)]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7017) lz4 codec for (un)compression

2019-01-30 Thread benj (JIRA)
benj created DRILL-7017:
---

 Summary: lz4 codec for (un)compression
 Key: DRILL-7017
 URL: https://issues.apache.org/jira/browse/DRILL-7017
 Project: Apache Drill
  Issue Type: Wish
  Components: Storage - Text  CSV
Affects Versions: 1.15.0
Reporter: benj


I didn't find in the documentation what compression formats are supported. But 
as it's possible to use drill on compressed file, like
{code:java}
SELECT * FROM tmp.`myfile.csv.gz`;
{code}
It will be useful to have the possibility to use this functionality for lz4 
file ([https://github.com/lz4/lz4)]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-30 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6997:

Reviewer: Aman Sinha

> Semijoin is changing the join ordering for some tpcds queries.
> --
>
> Key: DRILL-6997
> URL: https://issues.apache.org/jira/browse/DRILL-6997
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
> Attachments: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill, 
> 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill
>
>
> TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join 
> disabled at scale factor 100. It runs 100% slower at scale factor 1000. This 
> issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. 
> DRILL-6798: Planner changes to support semi-join.
> {code:java}
> with ws_wh as
>  (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
>  from web_sales ws1,web_sales ws2
>  where ws1.ws_order_number = ws2.ws_order_number
>  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
>  [_LIMITA] select [_LIMITB]
>  count(distinct ws_order_number) as "order count"
>  ,sum(ws_ext_ship_cost) as "total shipping cost"
>  ,sum(ws_net_profit) as "total net profit"
>  from
>  web_sales ws1
>  ,date_dim
>  ,customer_address
>  ,web_site
>  where
>  d_date between '[YEAR]-[MONTH]-01' and
>  (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
>  and ws1.ws_ship_date_sk = d_date_sk
>  and ws1.ws_ship_addr_sk = ca_address_sk
>  and ca_state = '[STATE]'
>  and ws1.ws_web_site_sk = web_site_sk
>  and web_company_name = 'pri'
>  and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  and ws1.ws_order_number in (select wr_order_number
>  from web_returns,ws_wh
>  where wr_order_number = ws_wh.ws_order_number)
>  order by count(distinct ws_order_number)
>  [_LIMITC];
> {code}
>  I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has 
> semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join 
> disabled. Both are executed with commit id 
> 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100.
> The plan with semi-join enabled has moved the first hash join:
> and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  It used to be on the build side of the first HJ on the left hand side 
> (04-05). It is now on the build side of the fourth HJ on the left hand side 
> (01-13).
> The plan with semi-join enabled has a hash_partition_sender (operator 05-00) 
> that takes 10 seconds to execute. But all the fragments take about the same 
> amount of time.
> The plan with semi-join enabled has two HJ that process 1B rows while the 
> plan with semi-joins disabled has one HJ that processes 1B rows.
> The plan with semi-join enabled has several senders and receivers that wait 
> more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 
> 17-00). When disabled, there is no operator waiting more than 10 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6911) Documentation issue - Hadoop core-site.xml is not supported by Drill to read S3 credentials

2019-01-30 Thread Denys Ordynskiy (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755897#comment-16755897
 ] 

Denys Ordynskiy commented on DRILL-6911:


Thanks [~bbevens] for the documentation update.
I think this "*Note: ...*" also should be deleted because Drill doesn't see the 
Hadoop core-site.xml file.

> Documentation issue - Hadoop core-site.xml is not supported by Drill to read 
> S3 credentials
> ---
>
> Key: DRILL-6911
> URL: https://issues.apache.org/jira/browse/DRILL-6911
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Denys Ordynskiy
>Assignee: Bridget Bevens
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.16.0
>
>
> In the Drill S3 documentation https://drill.apache.org/docs/s3-storage-plugin/
> Section "Providing AWS Credentials" describing 3 ways to setup AWS S3 
> credentials in Drill:
> - storage plugin;
> - Drill-specific core-site.xml;
> - existing S3 configuration for Hadoop.
> Third item is not supported by Drill. Hadoop core-site.xml config file may 
> contains S3 credentials, but Drill doesn't read any S3 parameters directly 
> from Hadoop config file.
> Third item 
> {code:java}
> In a Hadoop environment, you can use the existing S3 configuration for 
> Hadoop. The AWS credentials should already be defined. All you need to do is 
> configure the S3 storage plugin.
> {code}
> should be removed from the document 
> https://drill.apache.org/docs/s3-storage-plugin/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)