[jira] [Commented] (DRILL-3975) Partition Planning rule causes query failure due to IndexOutOfBoundsException on HDFS

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974627#comment-14974627
 ] 

ASF GitHub Bot commented on DRILL-3975:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/217


> Partition Planning rule causes query failure due to IndexOutOfBoundsException 
> on HDFS
> -
>
> Key: DRILL-3975
> URL: https://issues.apache.org/jira/browse/DRILL-3975
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jacques Nadeau
>Assignee: Steven Phillips
>
> In attempting to run the extended test suite provided by MapR, there are a 
> large number of queries that fail due to issues in the PruneScanRule and 
> specifically the DFSPartitionLocation constructor line 31. It is likely due 
> to issues with the code that are related to running on HDFS where this code 
> path has apparently not been tested.
> An example test query this type of failure occurred: 
> /src/drill-test-framework/resources/Functional/ctas/ctas_auto_partition/tpch0.01_multiple_partitions/data/q11.q
> Example stack trace below:
> {code}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> StringIndexOutOfBoundsException: String index out of range: -12
> [Error Id: f2941267-49b1-4f67-a17f-610ffb13fcb7 on 
> ip-172-31-30-32.us-west-2.compute.internal:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:742)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
> [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_85]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_85]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
> exception during fragment initialization: Internal error: Error while 
> applying rule PruneScanRule:Filter_On_Scan_Parquet, args 
> [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0,
>  1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, 
> ctasAutoPartition, 
> tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan
>  [entries=[ReadEntryWithPath 
> [path=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2]],
>  
> selectionRoot=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2,
>  numFiles=1, usedMetadataFile=false, columns=[`l_modline`, `l_moddate`]])]
> ... 4 common frames omitted
> Caused by: java.lang.AssertionError: Internal error: Error while applying 
> rule PruneScanRule:Filter_On_Scan_Parquet, args 
> [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0,
>  1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, 
> ctasAutoPartition, 
> tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan
>  [entries=[ReadEntryWithPath 
> [path=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2]],
>  
> selectionRoot=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2,
>  numFiles=1, usedMetadataFile=false, columns=[`l_modline`, `l_moddate`]])]
> at org.apache.calcite.util.Util.newInternal(Util.java:792) 
> ~[calcite-core-1.4.0-drill-r6.jar:1.4.0-drill-r6]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:251)
>  

[jira] [Commented] (DRILL-3975) Partition Planning rule causes query failure due to IndexOutOfBoundsException on HDFS

2015-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973736#comment-14973736
 ] 

ASF GitHub Bot commented on DRILL-3975:
---

Github user StevenMPhillips commented on the pull request:

https://github.com/apache/drill/pull/217#issuecomment-151026505
  
+1


> Partition Planning rule causes query failure due to IndexOutOfBoundsException 
> on HDFS
> -
>
> Key: DRILL-3975
> URL: https://issues.apache.org/jira/browse/DRILL-3975
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jacques Nadeau
>Assignee: Steven Phillips
>
> In attempting to run the extended test suite provided by MapR, there are a 
> large number of queries that fail due to issues in the PruneScanRule and 
> specifically the DFSPartitionLocation constructor line 31. It is likely due 
> to issues with the code that are related to running on HDFS where this code 
> path has apparently not been tested.
> An example test query this type of failure occurred: 
> /src/drill-test-framework/resources/Functional/ctas/ctas_auto_partition/tpch0.01_multiple_partitions/data/q11.q
> Example stack trace below:
> {code}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> StringIndexOutOfBoundsException: String index out of range: -12
> [Error Id: f2941267-49b1-4f67-a17f-610ffb13fcb7 on 
> ip-172-31-30-32.us-west-2.compute.internal:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:742)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
> [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_85]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_85]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
> exception during fragment initialization: Internal error: Error while 
> applying rule PruneScanRule:Filter_On_Scan_Parquet, args 
> [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0,
>  1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, 
> ctasAutoPartition, 
> tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan
>  [entries=[ReadEntryWithPath 
> [path=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2]],
>  
> selectionRoot=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2,
>  numFiles=1, usedMetadataFile=false, columns=[`l_modline`, `l_moddate`]])]
> ... 4 common frames omitted
> Caused by: java.lang.AssertionError: Internal error: Error while applying 
> rule PruneScanRule:Filter_On_Scan_Parquet, args 
> [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0,
>  1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, 
> ctasAutoPartition, 
> tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan
>  [entries=[ReadEntryWithPath 
> [path=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2]],
>  
> selectionRoot=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2,
>  numFiles=1, usedMetadataFile=false, columns=[`l_modline`, `l_moddate`]])]
> at org.apache.calcite.util.Util.newInternal(Util.java:792) 
> ~[calcite-core-1.4.0-drill-r6.jar:1.4.0-drill-r6]
> at 
> 

[jira] [Commented] (DRILL-3975) Partition Planning rule causes query failure due to IndexOutOfBoundsException on HDFS

2015-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973317#comment-14973317
 ] 

ASF GitHub Bot commented on DRILL-3975:
---

GitHub user jacques-n opened a pull request:

https://github.com/apache/drill/pull/217

DRILL-3975: Make sure to strip scheme and authority from partition lo…

…cation.

I didn't add tests as I've been unable to reproduce locally as it seems to 
require a full HDFS cluster. The extended test suite identifies this problem 
when run against HDFS so that should provide a sufficient test (~150 tests 
fails without this fix). @StevenMPhillips, can you take a look.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jacques-n/drill DRILL-3975

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/217.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #217


commit f9724731ea2a01e927318dd903f36358f413
Author: Jacques Nadeau 
Date:   2015-10-25T15:59:39Z

DRILL-3975: Make sure to strip scheme and authority from partition location.




> Partition Planning rule causes query failure due to IndexOutOfBoundsException 
> on HDFS
> -
>
> Key: DRILL-3975
> URL: https://issues.apache.org/jira/browse/DRILL-3975
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jacques Nadeau
>
> In attempting to run the extended test suite provided by MapR, there are a 
> large number of queries that fail due to issues in the PruneScanRule and 
> specifically the DFSPartitionLocation constructor line 31. It is likely due 
> to issues with the code that are related to running on HDFS where this code 
> path has apparently not been tested.
> An example test query this type of failure occurred: 
> /src/drill-test-framework/resources/Functional/ctas/ctas_auto_partition/tpch0.01_multiple_partitions/data/q11.q
> Example stack trace below:
> {code}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> StringIndexOutOfBoundsException: String index out of range: -12
> [Error Id: f2941267-49b1-4f67-a17f-610ffb13fcb7 on 
> ip-172-31-30-32.us-west-2.compute.internal:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:742)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
> [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_85]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_85]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
> exception during fragment initialization: Internal error: Error while 
> applying rule PruneScanRule:Filter_On_Scan_Parquet, args 
> [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0,
>  1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, 
> ctasAutoPartition, 
> tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan
>  [entries=[ReadEntryWithPath 
> [path=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2]],
>  
> selectionRoot=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2,
>  numFiles=1, usedMetadataFile=false, columns=[`l_modline`, `l_moddate`]])]
> ... 4 common frames omitted
> Caused by: 

[jira] [Commented] (DRILL-3975) Partition Planning rule causes query failure due to IndexOutOfBoundsException on HDFS

2015-10-24 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972984#comment-14972984
 ] 

Steven Phillips commented on DRILL-3975:


My approach has been to remove the scheme and authority from the paths any time 
I encounter code that uses the path as a key, or does any sort of string 
comparison. This is an area where I think we need to clean up. I don't think we 
are very consistent throughout the code base in how was handle paths.

The usual trick I use to strip away the schema and authority is the method 
Path.getPathWithoutSchemeAndAuthority(Path p). If I have String objects and not 
Path objects, I will convert the String to a path, use the utility method to 
remove scheme and authority, and then call toString().

> Partition Planning rule causes query failure due to IndexOutOfBoundsException 
> on HDFS
> -
>
> Key: DRILL-3975
> URL: https://issues.apache.org/jira/browse/DRILL-3975
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jacques Nadeau
>
> In attempting to run the extended test suite provided by MapR, there are a 
> large number of queries that fail due to issues in the PruneScanRule and 
> specifically the DFSPartitionLocation constructor line 31. It is likely due 
> to issues with the code that are related to running on HDFS where this code 
> path has apparently not been tested.
> An example test query this type of failure occurred: 
> /src/drill-test-framework/resources/Functional/ctas/ctas_auto_partition/tpch0.01_multiple_partitions/data/q11.q
> Example stack trace below:
> {code}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> StringIndexOutOfBoundsException: String index out of range: -12
> [Error Id: f2941267-49b1-4f67-a17f-610ffb13fcb7 on 
> ip-172-31-30-32.us-west-2.compute.internal:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:742)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
> [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_85]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_85]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
> exception during fragment initialization: Internal error: Error while 
> applying rule PruneScanRule:Filter_On_Scan_Parquet, args 
> [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0,
>  1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, 
> ctasAutoPartition, 
> tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan
>  [entries=[ReadEntryWithPath 
> [path=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2]],
>  
> selectionRoot=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2,
>  numFiles=1, usedMetadataFile=false, columns=[`l_modline`, `l_moddate`]])]
> ... 4 common frames omitted
> Caused by: java.lang.AssertionError: Internal error: Error while applying 
> rule PruneScanRule:Filter_On_Scan_Parquet, args 
> [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0,
>  1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, 
> ctasAutoPartition, 
> tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan
>  [entries=[ReadEntryWithPath 
> 

[jira] [Commented] (DRILL-3975) Partition Planning rule causes query failure due to IndexOutOfBoundsException on HDFS

2015-10-24 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972998#comment-14972998
 ] 

Mehant Baid commented on DRILL-3975:


This particular bug was happening when ParquetPruneScanRule was hitting IOOB in 
the logic you pointed out when there was no need to perform any splitting 
(since for auto partitioning scheme we get the partitioning column value from 
the file and not from the location). 

However while debugging this I found that the "selectionRoot" contained the 
scheme and "file" did not contain the scheme potentially causing IOOB you might 
be seeing. Stripping out the scheme makes sense, we cannot check for -1 
[Here|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/DFSPartitionLocation.java#L30]
 as it would cause the partitioning columns to be incorrectly empty. 

> Partition Planning rule causes query failure due to IndexOutOfBoundsException 
> on HDFS
> -
>
> Key: DRILL-3975
> URL: https://issues.apache.org/jira/browse/DRILL-3975
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jacques Nadeau
>
> In attempting to run the extended test suite provided by MapR, there are a 
> large number of queries that fail due to issues in the PruneScanRule and 
> specifically the DFSPartitionLocation constructor line 31. It is likely due 
> to issues with the code that are related to running on HDFS where this code 
> path has apparently not been tested.
> An example test query this type of failure occurred: 
> /src/drill-test-framework/resources/Functional/ctas/ctas_auto_partition/tpch0.01_multiple_partitions/data/q11.q
> Example stack trace below:
> {code}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> StringIndexOutOfBoundsException: String index out of range: -12
> [Error Id: f2941267-49b1-4f67-a17f-610ffb13fcb7 on 
> ip-172-31-30-32.us-west-2.compute.internal:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:742)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
> [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_85]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_85]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
> exception during fragment initialization: Internal error: Error while 
> applying rule PruneScanRule:Filter_On_Scan_Parquet, args 
> [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0,
>  1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, 
> ctasAutoPartition, 
> tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan
>  [entries=[ReadEntryWithPath 
> [path=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2]],
>  
> selectionRoot=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2,
>  numFiles=1, usedMetadataFile=false, columns=[`l_modline`, `l_moddate`]])]
> ... 4 common frames omitted
> Caused by: java.lang.AssertionError: Internal error: Error while applying 
> rule PruneScanRule:Filter_On_Scan_Parquet, args 
> [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0,
>  1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, 
> ctasAutoPartition, 
> tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan
>  

[jira] [Commented] (DRILL-3975) Partition Planning rule causes query failure due to IndexOutOfBoundsException on HDFS

2015-10-24 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972981#comment-14972981
 ] 

Jacques Nadeau commented on DRILL-3975:
---

[~mehant] & [~sphillips], I'm guessing this has to do with paths that are fully 
qualified or not. I think you both squelched some bugs earlier of this pattern 
I believe. What is the right thing to do here. I think the location is:

https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/DFSPartitionLocation.java#L31

> Partition Planning rule causes query failure due to IndexOutOfBoundsException 
> on HDFS
> -
>
> Key: DRILL-3975
> URL: https://issues.apache.org/jira/browse/DRILL-3975
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jacques Nadeau
>
> In attempting to run the extended test suite provided by MapR, there are a 
> large number of queries that fail due to issues in the PruneScanRule and 
> specifically the DFSPartitionLocation constructor line 31. It is likely due 
> to issues with the code that are related to running on HDFS where this code 
> path has apparently not been tested.
> An example test query this type of failure occurred: 
> /src/drill-test-framework/resources/Functional/ctas/ctas_auto_partition/tpch0.01_multiple_partitions/data/q11.q
> Example stack trace below:
> {code}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> StringIndexOutOfBoundsException: String index out of range: -12
> [Error Id: f2941267-49b1-4f67-a17f-610ffb13fcb7 on 
> ip-172-31-30-32.us-west-2.compute.internal:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:742)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
> [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_85]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_85]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
> exception during fragment initialization: Internal error: Error while 
> applying rule PruneScanRule:Filter_On_Scan_Parquet, args 
> [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0,
>  1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, 
> ctasAutoPartition, 
> tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan
>  [entries=[ReadEntryWithPath 
> [path=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2]],
>  
> selectionRoot=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2,
>  numFiles=1, usedMetadataFile=false, columns=[`l_modline`, `l_moddate`]])]
> ... 4 common frames omitted
> Caused by: java.lang.AssertionError: Internal error: Error while applying 
> rule PruneScanRule:Filter_On_Scan_Parquet, args 
> [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0,
>  1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, 
> ctasAutoPartition, 
> tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan
>  [entries=[ReadEntryWithPath 
> [path=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2]],
>  
> selectionRoot=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2,
>  numFiles=1, usedMetadataFile=false, columns=[`l_modline`,