[jira] [Updated] (DRILL-3546) S3 - jets3t - No such File Or Directory
[ https://issues.apache.org/jira/browse/DRILL-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Deegan updated DRILL-3546: - Description: Tested on 1.1 with commit id: {noformat} 0: jdbc:drill:zk=local select commit_id from sys.version; +---+ | commit_id | +---+ | e3fc7e97bfe712dc09d43a8a055a5135c96b7344 | +---+ {noformat} Three instance zookeeper cluster running drill with the jets3t plugin. Occassionally throws a No such file or directory error. Query example SELECT COUNT(*) FROM s3.json_directory; Might be a jets3t issue, existing issue here: https://bitbucket.org/jmurty/jets3t/issues/215/drill-intermittent-file-not-found-error {code} org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: Failure reading JSON file - No such file or directory 's3n://json_directory/xyz.json.gz' File /json_directory/xyz.json.gz Record 1 [Error Id: 3f83967b-0b7b-4778-b623-b7a20528e3d1 ] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523) ~[drill-common-1.1.0.jar:1.1.0] at org.apache.drill.exec.store.easy.json.JSONRecordReader.handleAndRaise(JSONRecordReader.java:161) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.store.easy.json.JSONRecordReader.setup(JSONRecordReader.java:130) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ScanBatch.init(ScanBatch.java:100) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin.getReaderBatch(EasyFormatPlugin.java:195) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:35) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:28) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:150) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:106) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:81) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:235) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.1.0.jar:1.1.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] Caused by:
[jira] [Commented] (DRILL-3547) IndexOutOfBoundsException on directory with ~20 subdirectories
[ https://issues.apache.org/jira/browse/DRILL-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640166#comment-14640166 ] Philip Deegan commented on DRILL-3547: -- The issue is due to an empty file amongst non-empty files. IndexOutOfBoundsException on directory with ~20 subdirectories -- Key: DRILL-3547 URL: https://issues.apache.org/jira/browse/DRILL-3547 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.1.0 Environment: RHEL 7 Reporter: Philip Deegan Assignee: Daniel Barclay (Drill) Tested on 1.1 with commit id: {noformat} 0: jdbc:drill:zk=local select commit_id from sys.version; +---+ | commit_id | +---+ | e3fc7e97bfe712dc09d43a8a055a5135c96b7344 | +---+ {noformat} Directory has child directories a to u, each contain json files. Running the query on each subdirectory indivudually does not cause an error. {noformat} java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: IndexOutOfBoundsException: index: 0, length: 1 (expected: range(0, 0)) Fragment 1:2 [Error Id: 69a0879f-f718-4930-ae6f-c526de05528c on ip-172-31-29-60.eu-central-1.compute.internal:31010] at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) at sqlline.SqlLine.print(SqlLine.java:1583) at sqlline.Commands.execute(Commands.java:852) at sqlline.Commands.sql(Commands.java:751) at sqlline.SqlLine.dispatch(SqlLine.java:738) at sqlline.SqlLine.begin(SqlLine.java:612) at sqlline.SqlLine.start(SqlLine.java:366) at sqlline.SqlLine.main(SqlLine.java:259) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3546) S3 - jets3t - No such File Or Directory
[ https://issues.apache.org/jira/browse/DRILL-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Deegan updated DRILL-3546: - Description: Tested on 1.1 with commit id: {noformat} 0: jdbc:drill:zk=local select commit_id from sys.version; +---+ | commit_id | +---+ | e3fc7e97bfe712dc09d43a8a055a5135c96b7344 | +---+ {noformat} Three instance zookeeper cluster running drill with the jets3t plugin. Occassionally throws a No such file or directory error. Query example SELECT COUNT(*) FROM s3.json_directory; Might be a jets3t issue, existing issue here: https://bitbucket.org/jmurty/jets3t/issues/215/drill-intermittent-file-not-found-error {code} org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: Failure reading JSON file - No such file or directory 's3n://json_directory/xyz.json.gz' File /json_directory/xyz.json.gz Record 1 [Error Id: 3f83967b-0b7b-4778-b623-b7a20528e3d1 ] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523) ~[drill-common-1.1.0.jar:1.1.0] at org.apache.drill.exec.store.easy.json.JSONRecordReader.handleAndRaise(JSONRecordReader.java:161) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.store.easy.json.JSONRecordReader.setup(JSONRecordReader.java:130) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ScanBatch.init(ScanBatch.java:100) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin.getReaderBatch(EasyFormatPlugin.java:195) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:35) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:28) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:150) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:106) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:81) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:235) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.1.0.jar:1.1.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] Caused by:
[jira] [Commented] (DRILL-3537) Empty Json file can potentially result into wrong results
[ https://issues.apache.org/jira/browse/DRILL-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640626#comment-14640626 ] Sean Hsuan-Yi Chu commented on DRILL-3537: -- https://reviews.apache.org/r/36782/ Empty Json file can potentially result into wrong results -- Key: DRILL-3537 URL: https://issues.apache.org/jira/browse/DRILL-3537 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators, Storage - JSON Reporter: Sean Hsuan-Yi Chu Assignee: Sean Hsuan-Yi Chu Priority: Critical Fix For: 1.2.0 In the directory, we have two files. One has some data and the other one is empty. A query as below: {code} select * from dfs.`directory`; {code} will produce different results according to the order of the files being read (The default order is in the alphabetic order of the filenames). To give a more concrete example, the non-empty json has data: {code} { a:1 } {code} By naming the files, you can control the orders. If the empty file is read in firstly, the result is {code} +---++ | * | a | +---++ | null | 1 | +---++ {code} If the opposite order takes place, the result is {code} ++ | a | ++ | 1 | | 2 | ++ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3313) Eliminate redundant #load methods and unit-test loading exporting of vectors
[ https://issues.apache.org/jira/browse/DRILL-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanifi Gunes updated DRILL-3313: Assignee: Jason Altekruse (was: Steven Phillips) Eliminate redundant #load methods and unit-test loading exporting of vectors -- Key: DRILL-3313 URL: https://issues.apache.org/jira/browse/DRILL-3313 Project: Apache Drill Issue Type: Sub-task Components: Execution - Data Types Affects Versions: 1.0.0 Reporter: Hanifi Gunes Assignee: Jason Altekruse Fix For: 1.2.0 Vectors have multiple #load methods that are used to populate data from raw buffers. It is relatively tough to reason, maintain and unit-test loading and exporting of data since there is many redundant code around load methods. This issue proposes to have single #load method conforming to VV#load(def, buffer) signature eliminating all other #load overrides. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-1750) Querying directories with JSON files returns incomplete results
[ https://issues.apache.org/jira/browse/DRILL-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanifi Gunes updated DRILL-1750: Assignee: Steven Phillips (was: Hanifi Gunes) Querying directories with JSON files returns incomplete results --- Key: DRILL-1750 URL: https://issues.apache.org/jira/browse/DRILL-1750 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Reporter: Abhishek Girish Assignee: Steven Phillips Priority: Critical Fix For: 1.2.0 Attachments: 1.json, 2.json, 3.json, 4.json, DRILL-1750_2015-07-06_16:39:04.patch I happened to observe that querying (select *) a directory with json files displays only fields common to all json files. All corresponding fields are displayed while querying each of the json files individually. And in some scenarios, querying the directory crashes sqlline. The example below may help make the issue clear: select * from dfs.`/data/json/tmp/1.json`; ++++ | artist | track_id | title| ++++ | Jonathan King | TRAAAEA128F935A30D | I'll Slap Your Face (Entertainment USA Theme) | ++++ 1 row selected (1.305 seconds) select * from dfs.`/data/json/tmp/2.json`; +++++ | artist | timestamp | track_id | title| +++++ | Supersuckers | 2011-08-01 20:30:17.991134 | TRAAAQN128F9353BA0 | Double Wide | +++++ 1 row selected (0.105 seconds) select * from dfs.`/data/json/tmp/3.json`; ++++ | timestamp | track_id | title| ++++ | 2011-08-01 20:30:17.991134 | TRAAAQN128F9353BA0 | Double Wide | ++++ 1 row selected (0.083 seconds) select * from dfs.`/data/json/tmp/4.json`; +++ | track_id | title| +++ | TRAAAQN128F9353BA0 | Double Wide | +++ 1 row selected (0.076 seconds) select * from dfs.`/data/json/tmp`; +++ | track_id | title| +++ | TRAAAQN128F9353BA0 | Double Wide | | TRAAAQN128F9353BA0 | Double Wide | | TRAAAEA128F935A30D | I'll Slap Your Face (Entertainment USA Theme) | | TRAAAQN128F9353BA0 | Double Wide | +++ 4 rows selected (0.121 seconds) JVM Crash occurs at times: select * from dfs.`/data/json/tmp`; ++++ | timestamp | track_id | title| ++++ | 2011-08-01 20:30:17.991134 | TRAAAQN128F9353BA0 | Double Wide | # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f3cb99be053, pid=13943, tid=139898808436480 # # JRE version: OpenJDK Runtime Environment (7.0_65-b17) (build 1.7.0_65-mockbuild_2014_07_16_06_06-b00) # Java VM: OpenJDK 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0x932053] # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try ulimit -c unlimited before starting Java again # # An error report file with more information is saved as: # /tmp/jvm-13943/hs_error.log # # If you would like to submit a bug report, please include # instructions on how to reproduce the bug and visit: # http://icedtea.classpath.org/bugzilla # Aborted -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3353) Non data-type related schema changes errors
[ https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanifi Gunes updated DRILL-3353: Assignee: Steven Phillips (was: Hanifi Gunes) Non data-type related schema changes errors --- Key: DRILL-3353 URL: https://issues.apache.org/jira/browse/DRILL-3353 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 1.0.0 Reporter: Oscar Bernal Assignee: Steven Phillips Fix For: 1.2.0 Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip I'm having trouble querying a data set with varying schema for a nested object fields. The majority of my data for a specific type of record has the following nested data: {code} attributes:{daysSinceInstall:0,destination:none,logged:no,nth:1,type:organic,wearable:no}} {code} Among those records (hundreds of them) I have only two with a slightly different schema: {code} attributes:{adSet:Teste-Adwords-Engagement-Branch-iOS-230615-adset,campaign:Teste-Adwords-Engagement-Branch-iOS-230615,channel:Adwords,daysSinceInstall:0,destination:none,logged:no,nth:4,type:branch,wearable:no}} {code} When trying to query the new fields, my queries fail: With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 'Teste-FB-Engagement-Puro-iOS-230615'; Error: SYSTEM ERROR: java.lang.NumberFormatException: Teste-FB-Engagement-Puro-iOS-230615 Fragment 0:0 [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE'; Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type when you are using a ValueWriter of type NullableVarCharWriterImpl. File file.json Record 35 Fragment 0:0 [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} If I try to extract all attributes from those events, Drill will only return a subset of the fields, ignoring the others. {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App'; ++ | EXPR$0 | ++ | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:}| | {logged:no,wearable:no,type:} | ++ {noformat} What I find strange is that I have thousands of records in the same file with different schema for different record types and all other queries seem run well. Is there something about how Drill infers schema that I might be missing here? Does it infer based on a sample % of the data and fail for records that were not taken into account while inferring schema? I suspect I wouldn't have this error if I had 100's of records with that other schema inside the file, but I can't find anything in the docs or code to support that hypothesis. Perhaps it's just a bug? Is it expected? Troubleshooting guide seems to mention something about this but it's very vague in implying Drill doesn't fully support schema changes. I thought that was for data type changes mostly, for which there are other well documented issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3476) Filter on nested element gives wrong results
[ https://issues.apache.org/jira/browse/DRILL-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640611#comment-14640611 ] ASF GitHub Bot commented on DRILL-3476: --- Github user hnfgns commented on the pull request: https://github.com/apache/drill/pull/83#issuecomment-124561596 +1 Filter on nested element gives wrong results Key: DRILL-3476 URL: https://issues.apache.org/jira/browse/DRILL-3476 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips Priority: Critical Fix For: 1.2.0 Take this query for example: {code} 0: jdbc:drill:drillbit=localhost select * from t; ++ | a| ++ | {b:1,c:1} | ++ {code} if I instead run: {code} 0: jdbc:drill:drillbit=localhost select a from t where t.a.b = 1; ++ | a| ++ | {b:1} | ++ {code} Only a.b was returned, but the select specified a. In this case, it should have returned all of the elements of a, not just the one specified in the filter. This is because the logic in FieldSelection does not correctly handle the case where a selected column is a child of another selected column. In such a case, the record reader should ignore the child column, and just return the full selected parent column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors
[ https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640613#comment-14640613 ] ASF GitHub Bot commented on DRILL-3353: --- Github user hnfgns commented on the pull request: https://github.com/apache/drill/pull/86#issuecomment-124561688 +1 Non data-type related schema changes errors --- Key: DRILL-3353 URL: https://issues.apache.org/jira/browse/DRILL-3353 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 1.0.0 Reporter: Oscar Bernal Assignee: Hanifi Gunes Fix For: 1.2.0 Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip I'm having trouble querying a data set with varying schema for a nested object fields. The majority of my data for a specific type of record has the following nested data: {code} attributes:{daysSinceInstall:0,destination:none,logged:no,nth:1,type:organic,wearable:no}} {code} Among those records (hundreds of them) I have only two with a slightly different schema: {code} attributes:{adSet:Teste-Adwords-Engagement-Branch-iOS-230615-adset,campaign:Teste-Adwords-Engagement-Branch-iOS-230615,channel:Adwords,daysSinceInstall:0,destination:none,logged:no,nth:4,type:branch,wearable:no}} {code} When trying to query the new fields, my queries fail: With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 'Teste-FB-Engagement-Puro-iOS-230615'; Error: SYSTEM ERROR: java.lang.NumberFormatException: Teste-FB-Engagement-Puro-iOS-230615 Fragment 0:0 [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE'; Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type when you are using a ValueWriter of type NullableVarCharWriterImpl. File file.json Record 35 Fragment 0:0 [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} If I try to extract all attributes from those events, Drill will only return a subset of the fields, ignoring the others. {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App'; ++ | EXPR$0 | ++ | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:}| | {logged:no,wearable:no,type:} | ++ {noformat} What I find strange is that I have thousands of records in the same file with different schema for different record types and all other queries seem run well. Is there something about how Drill infers schema that I might be missing here? Does it infer based on a sample % of the data and fail for records that were not taken into account while inferring schema? I suspect I wouldn't have this error if I had 100's of records with that other schema inside the file, but I can't find anything in the docs or code to support that hypothesis. Perhaps it's just a bug? Is it expected? Troubleshooting guide seems to mention something about this but it's very vague in implying Drill doesn't fully support schema changes. I thought that was for data type changes mostly, for which there are other well documented issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2838) Applying flatten after joining 2 sub-queries returns empty maps
[ https://issues.apache.org/jira/browse/DRILL-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanifi Gunes updated DRILL-2838: Assignee: Jason Altekruse (was: Hanifi Gunes) Applying flatten after joining 2 sub-queries returns empty maps --- Key: DRILL-2838 URL: https://issues.apache.org/jira/browse/DRILL-2838 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Reporter: Rahul Challapalli Assignee: Jason Altekruse Priority: Critical Fix For: 1.2.0 Attachments: DRILL-2838.patch, data.json git.commit.id.abbrev=5cd36c5 The below query applies flatten after joining 2 subqueries. It generates empty maps which is wrong {code} select v1.uid, flatten(events), flatten(transactions) from (select uid, events from `data.json`) v1 inner join (select uid, transactions from `data.json`) v2 on v1.uid = v2.uid; ++++ |uid | EXPR$1 | EXPR$2 | ++++ | 1 | {} | {} | | 1 | {} | {} | | 1 | {} | {} | | 1 | {} | {} | | 1 | {} | {} | | 1 | {} | {} | | 1 | {} | {} | | 1 | {} | {} | | 1 | {} | {} | | 1 | {} | {} | | 1 | {} | {} | | 1 | {} | {} | | 1 | {} | {} | | 1 | {} | {} | | 1 | {} | {} | | 1 | {} | {} | | 1 | {} | {} | | 1 | {} | {} | | 2 | {} | {} | | 2 | {} | {} | | 2 | {} | {} | | 2 | {} | {} | | 2 | {} | {} | | 2 | {} | {} | | 2 | {} | {} | | 2 | {} | {} | | 2 | {} | {} | | 2 | {} | {} | | 2 | {} | {} | | 2 | {} | {} | | 2 | {} | {} | | 2 | {} | {} | | 2 | {} | {} | | 2 | {} | {} | | 2 | {} | {} | | 2 | {} | {} | ++++ 36 rows selected (0.244 seconds) {code} I attached the data set. Let me know if you have any questions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3476) Filter on nested element gives wrong results
[ https://issues.apache.org/jira/browse/DRILL-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanifi Gunes updated DRILL-3476: Assignee: Steven Phillips (was: Hanifi Gunes) Filter on nested element gives wrong results Key: DRILL-3476 URL: https://issues.apache.org/jira/browse/DRILL-3476 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips Priority: Critical Fix For: 1.2.0 Take this query for example: {code} 0: jdbc:drill:drillbit=localhost select * from t; ++ | a| ++ | {b:1,c:1} | ++ {code} if I instead run: {code} 0: jdbc:drill:drillbit=localhost select a from t where t.a.b = 1; ++ | a| ++ | {b:1} | ++ {code} Only a.b was returned, but the select specified a. In this case, it should have returned all of the elements of a, not just the one specified in the filter. This is because the logic in FieldSelection does not correctly handle the case where a selected column is a child of another selected column. In such a case, the record reader should ignore the child column, and just return the full selected parent column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3533) null values in a sub-structure in Parquet returns unexpected/misleading results
[ https://issues.apache.org/jira/browse/DRILL-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640835#comment-14640835 ] ASF GitHub Bot commented on DRILL-3533: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/97 null values in a sub-structure in Parquet returns unexpected/misleading results --- Key: DRILL-3533 URL: https://issues.apache.org/jira/browse/DRILL-3533 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.1.0 Reporter: Stefán Baxter Assignee: Parth Chandra Priority: Critical With this minimal dataset as /tmp/test.json: {dimensions:{adults:A}} select lower(p.dimensions.budgetLevel) as `field1`, lower(p.dimensions.adults) as `field2` from dfs.tmp.`/test.json` as p; Returns this: +-+-+ | field1 | field2 | +-+-+ | null| a | +-+-+ With the same data as a Parquet file CREATE TABLE dfs.tmp.`/test` AS SELECT * FROM dfs.tmp.`/test.json`; The same query: select lower(p.dimensions.budgetLevel) as `field1`, lower(p.dimensions.adults) as `field2` from dfs.tmp.`/test/0_0_0.parquet` as p; Return this: +-+-+ | field1 | field2 | +-+-+ | a | null| +-+-+ After some more testing it appears that this has nothing to do with trim. (any non existing nested-value will be pushed aside) select p.dimensions.budgetLevel as `field1`, lower(p.dimensions.adults) as `field2` from dfs.tmp.`/test/0_0_0.parquet` as p; also returns: +-+-+ | field1 | field2 | +-+-+ | a | null| +-+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3546) S3 - jets3t - No such File Or Directory
[ https://issues.apache.org/jira/browse/DRILL-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) updated DRILL-3546: -- Assignee: (was: Daniel Barclay (Drill)) S3 - jets3t - No such File Or Directory --- Key: DRILL-3546 URL: https://issues.apache.org/jira/browse/DRILL-3546 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.1.0 Environment: RHEL 7 Reporter: Philip Deegan Tested on 1.1 with commit id: {noformat} 0: jdbc:drill:zk=local select commit_id from sys.version; +---+ | commit_id | +---+ | e3fc7e97bfe712dc09d43a8a055a5135c96b7344 | +---+ {noformat} Three instance zookeeper cluster running drill with the jets3t plugin. Occassionally throws a No such file or directory error. Query example SELECT COUNT(*) FROM s3.json_directory; Might be a jets3t issue, existing issue here: https://bitbucket.org/jmurty/jets3t/issues/215/drill-intermittent-file-not-found-error {code} org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: Failure reading JSON file - No such file or directory 's3n://json_directory/xyz.json.gz' File /json_directory/xyz.json.gz Record 1 [Error Id: 3f83967b-0b7b-4778-b623-b7a20528e3d1 ] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523) ~[drill-common-1.1.0.jar:1.1.0] at org.apache.drill.exec.store.easy.json.JSONRecordReader.handleAndRaise(JSONRecordReader.java:161) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.store.easy.json.JSONRecordReader.setup(JSONRecordReader.java:130) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ScanBatch.init(ScanBatch.java:100) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin.getReaderBatch(EasyFormatPlugin.java:195) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:35) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:28) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:150) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:106) [drill-java-exec-1.1.0.jar:1.1.0] at org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:81) [drill-java-exec-1.1.0.jar:1.1.0] at
[jira] [Comment Edited] (DRILL-2288) result set metadata not set for zero-row result (DatabaseMetaData.getColumns(...))
[ https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632614#comment-14632614 ] Daniel Barclay (Drill) edited comment on DRILL-2288 at 7/24/15 5:54 PM: Investigation notes: - Is not a JDBC problem--seems to be an INFORMATION_SCHEMA/ischema problem. - Has something to do with ischema filtering--whether metadata is missing or not depends on whether having zero rows was caused by mismatching one of the specially filtered (pushed-down?) fields (e.g., TABLE_SCHEMA and TABLE_NAME for COLUMNS) or not, respectively. - Seems that a downstream schema is derived from the set of value vectors (etc.) at some point, but that set is empty sometimes when there are no rows (when no values have been written to vectors/vector container?). - Does seem to be in INFORMATION_SCHEMA plug-in: It doesn't seem to use PojoDataType as system-tables plug-in does. was (Author: dsbos): Investigation notes: - Is not a JDBC problem--seems to be an INFORMATION_SCHEMA/ischema problem. - Has something to do with ischema filtering--whether metadata is missing or not depends on whether having zero rows was caused by mismatching one of the specially filtered (pushed-down?) fields (e.g., TABLE_SCHEMA and TABLE_NAME for COLUMNS) or not, respectively. - Might not be in INFORMATION_SCHEMA. - Seems that a downstream schema is derived from the set of value vectors (etc.) at some point, but that set is empty sometimes when there are no rows (when no values have been written to vectors/vector container?). - Does seem to be in INFORMATION_SCHEMA plug-in: It doesn't seem to use PojoDataType as system-tables plug-in does. result set metadata not set for zero-row result (DatabaseMetaData.getColumns(...)) --- Key: DRILL-2288 URL: https://issues.apache.org/jira/browse/DRILL-2288 Project: Apache Drill Issue Type: Bug Components: Storage - Information Schema Reporter: Daniel Barclay (Drill) Assignee: Daniel Barclay (Drill) Fix For: 1.2.0 Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java The ResultSetMetaData object from getMetadata() of a ResultSet is not set up (getColumnCount() returns zero, and trying to access any other metadata throws IndexOutOfBoundsException) for a result set with zero rows, at least for one from DatabaseMetaData.getColumns(...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3553) add support for LEAD and LAG window functions
Deneche A. Hakim created DRILL-3553: --- Summary: add support for LEAD and LAG window functions Key: DRILL-3553 URL: https://issues.apache.org/jira/browse/DRILL-3553 Project: Apache Drill Issue Type: Sub-task Components: Execution - Relational Operators Reporter: Deneche A. Hakim Assignee: Deneche A. Hakim -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3536) Add support for LEAD, LAG, NTILE, FIRST_VALUE and LAST_VALUE window functions
[ https://issues.apache.org/jira/browse/DRILL-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim updated DRILL-3536: Labels: window_function (was: ) Add support for LEAD, LAG, NTILE, FIRST_VALUE and LAST_VALUE window functions - Key: DRILL-3536 URL: https://issues.apache.org/jira/browse/DRILL-3536 Project: Apache Drill Issue Type: Improvement Components: Execution - Relational Operators Reporter: Deneche A. Hakim Assignee: Deneche A. Hakim Labels: window_function Fix For: 1.2.0 This JIRA will track the progress on the following window functions (no particular order): - LEAD - LAG - NTILE - FIRST_VALUE - LAST_VALUE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3553) add support for LEAD and LAG window functions
[ https://issues.apache.org/jira/browse/DRILL-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim updated DRILL-3553: Description: From SQL standard here is the general format of LEAD and LAG: {noformat} window function ::= lead or lag function OVER window name or specification {noformat} {noformat} lead or lag function ::= lead or lag ( lead or lag extent [ , offset [ , default expression ] ] ) [ null treatment ] {noformat} {noformat} lead or lag ::= LEAD | LAG {noformat} {noformat} lead or lag extent ::= value expression {noformat} {noformat} offset ::= exact numeric literal {noformat} {noformat} default expression ::= value expression {noformat} {noformat} null treatment ::= RESPECT NULLS | IGNORE NULLS {noformat} was: From SQL standard here is the general format of LEAD and LAG: {noformat} window function ::= lead or lag function OVER window name or specification lead or lag function ::= lead or lag ( lead or lag extent [ , offset [ , default expression ] ] ) [ null treatment ] lead or lag ::= LEAD | LAG lead or lag extent ::= value expression offset ::= exact numeric literal default expression ::= value expression null treatment ::= RESPECT NULLS | IGNORE NULLS {noformat} add support for LEAD and LAG window functions - Key: DRILL-3553 URL: https://issues.apache.org/jira/browse/DRILL-3553 Project: Apache Drill Issue Type: Sub-task Components: Execution - Relational Operators Reporter: Deneche A. Hakim Assignee: Deneche A. Hakim Labels: window_function Fix For: 1.2.0 From SQL standard here is the general format of LEAD and LAG: {noformat} window function ::= lead or lag function OVER window name or specification {noformat} {noformat} lead or lag function ::= lead or lag ( lead or lag extent [ , offset [ , default expression ] ] ) [ null treatment ] {noformat} {noformat} lead or lag ::= LEAD | LAG {noformat} {noformat} lead or lag extent ::= value expression {noformat} {noformat} offset ::= exact numeric literal {noformat} {noformat} default expression ::= value expression {noformat} {noformat} null treatment ::= RESPECT NULLS | IGNORE NULLS {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3553) add support for LEAD and LAG window functions
[ https://issues.apache.org/jira/browse/DRILL-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641182#comment-14641182 ] Deneche A. Hakim commented on DRILL-3553: - Calcite only supports {{RESPECT NULLS}} by default (but not when explicitly stated in a query) for now. Until CALCITE-337 is fixed we won't be able to support {{IGNORE NULLS}} add support for LEAD and LAG window functions - Key: DRILL-3553 URL: https://issues.apache.org/jira/browse/DRILL-3553 Project: Apache Drill Issue Type: Sub-task Components: Execution - Relational Operators Reporter: Deneche A. Hakim Assignee: Deneche A. Hakim Labels: window_function Fix For: 1.2.0 From SQL standard here is the general format of LEAD and LAG: {noformat} window function ::= lead or lag function OVER window name or specification {noformat} {noformat} lead or lag function ::= lead or lag ( lead or lag extent [ , offset [ , default expression ] ] ) [ null treatment ] {noformat} {noformat} lead or lag ::= LEAD | LAG {noformat} {noformat} lead or lag extent ::= value expression {noformat} {noformat} offset ::= exact numeric literal {noformat} {noformat} default expression ::= value expression {noformat} {noformat} null treatment ::= RESPECT NULLS | IGNORE NULLS {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3553) add support for LEAD and LAG window functions
[ https://issues.apache.org/jira/browse/DRILL-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim updated DRILL-3553: Description: From SQL standard here is the general format of LEAD and LAG: {noformat} window function ::= lead or lag function OVER window name or specification lead or lag function ::= lead or lag ( lead or lag extent [ , offset [ , default expression ] ] ) [ null treatment ] lead or lag ::= LEAD | LAG lead or lag extent ::= value expression offset ::= exact numeric literal default expression ::= value expression null treatment ::= RESPECT NULLS | IGNORE NULLS {noformat} add support for LEAD and LAG window functions - Key: DRILL-3553 URL: https://issues.apache.org/jira/browse/DRILL-3553 Project: Apache Drill Issue Type: Sub-task Components: Execution - Relational Operators Reporter: Deneche A. Hakim Assignee: Deneche A. Hakim Labels: window_function Fix For: 1.2.0 From SQL standard here is the general format of LEAD and LAG: {noformat} window function ::= lead or lag function OVER window name or specification lead or lag function ::= lead or lag ( lead or lag extent [ , offset [ , default expression ] ] ) [ null treatment ] lead or lag ::= LEAD | LAG lead or lag extent ::= value expression offset ::= exact numeric literal default expression ::= value expression null treatment ::= RESPECT NULLS | IGNORE NULLS {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3553) add support for LEAD and LAG window functions
[ https://issues.apache.org/jira/browse/DRILL-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim updated DRILL-3553: Labels: window_function (was: ) add support for LEAD and LAG window functions - Key: DRILL-3553 URL: https://issues.apache.org/jira/browse/DRILL-3553 Project: Apache Drill Issue Type: Sub-task Components: Execution - Relational Operators Reporter: Deneche A. Hakim Assignee: Deneche A. Hakim Labels: window_function Fix For: 1.2.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3554) Union over TIME and TIMESTAMP values throws SchemaChangeException
Khurram Faraaz created DRILL-3554: - Summary: Union over TIME and TIMESTAMP values throws SchemaChangeException Key: DRILL-3554 URL: https://issues.apache.org/jira/browse/DRILL-3554 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.2.0 Environment: 4 node cluster on CentOS Reporter: Khurram Faraaz Assignee: Chris Westin Union over TIME and TIMESTAMP values results in Exception commit ID : 17e580a7 {code} 0: jdbc:drill:schema=dfs.tmp select c9, c5 from union_01 union select c5, c9 from union_02; Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [castTIMESTAMP(TIME-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. Fragment 0:0 [Error Id: 18eed3ba-f046-48ed-93a6-19ffa87f969e on centos-02.qa.lab:31010] (state=,code=0) {code} Stack trace from drillbit.log 2015-07-24 22:09:57,467 [2a4d4849-d440-981d-ebf0-b4c35010bf02:frag:0:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [castTIMESTAMP(TIME-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. Fragment 0:0 [Error Id: 18eed3ba-f046-48ed-93a6-19ffa87f969e on centos-02.qa.lab:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [castTIMESTAMP(TIME-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. Fragment 0:0 [Error Id: 18eed3ba-f046-48ed-93a6-19ffa87f969e on centos-02.qa.lab:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523) ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:178) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] Caused by: org.apache.drill.exec.exception.SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [castTIMESTAMP(TIME-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. at org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch.doWork(UnionAllRecordBatch.java:228) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch.innerNext(UnionAllRecordBatch.java:116) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(HashAggBatch.java:96) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:127) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
[jira] [Updated] (DRILL-3364) Prune scan range if the filter is on the leading field with byte comparable encoding
[ https://issues.apache.org/jira/browse/DRILL-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Smidth Panchamia updated DRILL-3364: Attachment: 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch The change adds some framework to handle conditions on row-key prefix. It also adds support to perform row-key range pruning when the row-key prefix is interpretted as DATE_EPOCH_BE encoded. Prune scan range if the filter is on the leading field with byte comparable encoding Key: DRILL-3364 URL: https://issues.apache.org/jira/browse/DRILL-3364 Project: Apache Drill Issue Type: Sub-task Components: Storage - HBase Reporter: Aditya Kishore Assignee: Smidth Panchamia Fix For: 1.2.0 Attachments: 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch, composite.jun26.diff -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3553) add support for LEAD and LAG window functions
[ https://issues.apache.org/jira/browse/DRILL-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim updated DRILL-3553: Description: From SQL standard here is the general format of LEAD and LAG: {noformat} window function ::= lead or lag function OVER window name or specification {noformat} {noformat} lead or lag function ::= lead or lag ( lead or lag extent [ , offset [ , default expression ] ] ) [ null treatment ] {noformat} {noformat} lead or lag ::= LEAD | LAG {noformat} {noformat} lead or lag extent ::= value expression {noformat} {noformat} offset ::= exact numeric literal {noformat} {noformat} default expression ::= value expression {noformat} The following won't be supported until CALCITE-337 is resolved: {noformat} null treatment ::= RESPECT NULLS | IGNORE NULLS {noformat} was: From SQL standard here is the general format of LEAD and LAG: {noformat} window function ::= lead or lag function OVER window name or specification {noformat} {noformat} lead or lag function ::= lead or lag ( lead or lag extent [ , offset [ , default expression ] ] ) [ null treatment ] {noformat} {noformat} lead or lag ::= LEAD | LAG {noformat} {noformat} lead or lag extent ::= value expression {noformat} {noformat} offset ::= exact numeric literal {noformat} {noformat} default expression ::= value expression {noformat} {noformat} null treatment ::= RESPECT NULLS | IGNORE NULLS {noformat} add support for LEAD and LAG window functions - Key: DRILL-3553 URL: https://issues.apache.org/jira/browse/DRILL-3553 Project: Apache Drill Issue Type: Sub-task Components: Execution - Relational Operators Reporter: Deneche A. Hakim Assignee: Deneche A. Hakim Labels: window_function Fix For: 1.2.0 From SQL standard here is the general format of LEAD and LAG: {noformat} window function ::= lead or lag function OVER window name or specification {noformat} {noformat} lead or lag function ::= lead or lag ( lead or lag extent [ , offset [ , default expression ] ] ) [ null treatment ] {noformat} {noformat} lead or lag ::= LEAD | LAG {noformat} {noformat} lead or lag extent ::= value expression {noformat} {noformat} offset ::= exact numeric literal {noformat} {noformat} default expression ::= value expression {noformat} The following won't be supported until CALCITE-337 is resolved: {noformat} null treatment ::= RESPECT NULLS | IGNORE NULLS {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3547) IndexOutOfBoundsException on directory with ~20 subdirectories
[ https://issues.apache.org/jira/browse/DRILL-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) updated DRILL-3547: -- Assignee: (was: Daniel Barclay (Drill)) IndexOutOfBoundsException on directory with ~20 subdirectories -- Key: DRILL-3547 URL: https://issues.apache.org/jira/browse/DRILL-3547 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.1.0 Environment: RHEL 7 Reporter: Philip Deegan Tested on 1.1 with commit id: {noformat} 0: jdbc:drill:zk=local select commit_id from sys.version; +---+ | commit_id | +---+ | e3fc7e97bfe712dc09d43a8a055a5135c96b7344 | +---+ {noformat} Directory has child directories a to u, each contain json files. Running the query on each subdirectory indivudually does not cause an error. {noformat} java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: IndexOutOfBoundsException: index: 0, length: 1 (expected: range(0, 0)) Fragment 1:2 [Error Id: 69a0879f-f718-4930-ae6f-c526de05528c on ip-172-31-29-60.eu-central-1.compute.internal:31010] at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) at sqlline.SqlLine.print(SqlLine.java:1583) at sqlline.Commands.execute(Commands.java:852) at sqlline.Commands.sql(Commands.java:751) at sqlline.SqlLine.dispatch(SqlLine.java:738) at sqlline.SqlLine.begin(SqlLine.java:612) at sqlline.SqlLine.start(SqlLine.java:366) at sqlline.SqlLine.main(SqlLine.java:259) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3554) Union over TIME and TIMESTAMP values throws SchemaChangeException
[ https://issues.apache.org/jira/browse/DRILL-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Khurram Faraaz updated DRILL-3554: -- Assignee: Sean Hsuan-Yi Chu (was: Chris Westin) Union over TIME and TIMESTAMP values throws SchemaChangeException - Key: DRILL-3554 URL: https://issues.apache.org/jira/browse/DRILL-3554 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.2.0 Environment: 4 node cluster on CentOS Reporter: Khurram Faraaz Assignee: Sean Hsuan-Yi Chu Union over TIME and TIMESTAMP values results in Exception commit ID : 17e580a7 {code} 0: jdbc:drill:schema=dfs.tmp select c9, c5 from union_01 union select c5, c9 from union_02; Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [castTIMESTAMP(TIME-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. Fragment 0:0 [Error Id: 18eed3ba-f046-48ed-93a6-19ffa87f969e on centos-02.qa.lab:31010] (state=,code=0) {code} Stack trace from drillbit.log 2015-07-24 22:09:57,467 [2a4d4849-d440-981d-ebf0-b4c35010bf02:frag:0:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [castTIMESTAMP(TIME-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. Fragment 0:0 [Error Id: 18eed3ba-f046-48ed-93a6-19ffa87f969e on centos-02.qa.lab:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [castTIMESTAMP(TIME-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. Fragment 0:0 [Error Id: 18eed3ba-f046-48ed-93a6-19ffa87f969e on centos-02.qa.lab:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523) ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:178) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] Caused by: org.apache.drill.exec.exception.SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [castTIMESTAMP(TIME-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. at org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch.doWork(UnionAllRecordBatch.java:228) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch.innerNext(UnionAllRecordBatch.java:116) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(HashAggBatch.java:96) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:127) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at
[jira] [Updated] (DRILL-3364) Prune scan range if the filter is on the leading field with byte comparable encoding
[ https://issues.apache.org/jira/browse/DRILL-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Smidth Panchamia updated DRILL-3364: Attachment: 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch The change adds support to perform row-key range pruning when the row-key prefix is interpretted as TIME_EPOCH_BE, TIMESTAMP_EPOCH_BE or UINT8_BE encoded. Prune scan range if the filter is on the leading field with byte comparable encoding Key: DRILL-3364 URL: https://issues.apache.org/jira/browse/DRILL-3364 Project: Apache Drill Issue Type: Sub-task Components: Storage - HBase Reporter: Aditya Kishore Assignee: Smidth Panchamia Fix For: 1.2.0 Attachments: 0001-Add-convert_from-and-convert_to-methods-for-TIMESTAM.patch, 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch, 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch, composite.jun26.diff -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3364) Prune scan range if the filter is on the leading field with byte comparable encoding
[ https://issues.apache.org/jira/browse/DRILL-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Smidth Panchamia updated DRILL-3364: Attachment: 0001-Add-convert_from-and-convert_to-methods-for-TIMESTAM.patch Add convert_from and convert_to methods for TIMESTAMP type. This will help in scan range pruning when the query is on leading bytes of row-key and it needs to be interpreted as timestamp. Prune scan range if the filter is on the leading field with byte comparable encoding Key: DRILL-3364 URL: https://issues.apache.org/jira/browse/DRILL-3364 Project: Apache Drill Issue Type: Sub-task Components: Storage - HBase Reporter: Aditya Kishore Assignee: Smidth Panchamia Fix For: 1.2.0 Attachments: 0001-Add-convert_from-and-convert_to-methods-for-TIMESTAM.patch, 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch, 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch, composite.jun26.diff -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3555) Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail
[ https://issues.apache.org/jira/browse/DRILL-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641301#comment-14641301 ] Deneche A. Hakim commented on DRILL-3555: - what dataset are you running the query on ? Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail -- Key: DRILL-3555 URL: https://issues.apache.org/jira/browse/DRILL-3555 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.1.0, 1.2.0 Environment: 4 Nodes. Direct Memory= 48 GB each Reporter: Abhishek Girish Assignee: Jinfeng Ni Priority: Critical Changing the default value for planner.memory.max_query_memory_per_node from 2 GB to anything higher causes queries with window functions to fail. Changed system options {code:sql} select * from sys.options where status like '%CHANGE%'; +---+--+-+--+-+-+---++ | name| kind | type | status | num_val | string_val | bool_val | float_val | +---+--+-+--+-+-+---++ | planner.enable_decimal_data_type | BOOLEAN | SYSTEM | CHANGED | null| null| true | null | | planner.memory.max_query_memory_per_node | LONG | SYSTEM | CHANGED | 8589934592 | null| null | null | +---+--+-+--+-+-+---++ 2 rows selected (0.249 seconds) {code} Query {code:sql} SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) FROM store_sales ss LIMIT 20; java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) at sqlline.SqlLine.print(SqlLine.java:1583) at sqlline.Commands.execute(Commands.java:852) at sqlline.Commands.sql(Commands.java:751) at sqlline.SqlLine.dispatch(SqlLine.java:738) at sqlline.SqlLine.begin(SqlLine.java:612) at sqlline.SqlLine.start(SqlLine.java:366) at sqlline.SqlLine.main(SqlLine.java:259) {code} Log: {code} 2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: State change requested RUNNING -- FINISHED 2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO o.a.d.e.w.f.FragmentStatusReporter - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: State to report: FINISHED 2015-07-23 18:17:05,485 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR o.a.d.e.p.i.s.SortRecordBatchBuilder - Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 2015-07-23 18:17:05,486 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: State change requested RUNNING -- FAILED ... 2015-07-23 18:17:05,990 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: State change requested FAILED -- FINISHED 2015-07-23 18:17:05,999 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523) ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at
[jira] [Commented] (DRILL-2218) Constant folding rule exposing planning bugs and not being used in plan where the constant expression is in the select list
[ https://issues.apache.org/jira/browse/DRILL-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641285#comment-14641285 ] Jason Altekruse commented on DRILL-2218: An update on this issue after discussing with [~jni] and [~amansinha100]. The cost model for project currently on considers the number of expressions present, not the complexity of the expressions. Therefore the rule being fired to reduce the expression is producing the correct rewritten project, but it is not being selected because it is exposing the same cost value as the version of the project where the full expression is still present. Constant folding rule exposing planning bugs and not being used in plan where the constant expression is in the select list --- Key: DRILL-2218 URL: https://issues.apache.org/jira/browse/DRILL-2218 Project: Apache Drill Issue Type: Improvement Components: Query Planning Optimization Reporter: Jason Altekruse Assignee: Aman Sinha Fix For: 1.4.0 This test method and rule is not currently in the master branch, but it does appear in the patch posted for constant expression folding during planning, DRILL-2060. Once it is merged, the test TestConstantFolding.testConstExprFolding_InSelect() which is currently ignored, will be failing. The issue is that even though the constant folding rule for project is firing, and I have traced it to see that a replacement project with a literal is created, it is not being selected in the final plan. This seems rather odd, as there is a comment in the last line of the onMatch() method of the rule that says the following. This does not appear to be having the desired effect, may need to file a bug in calcite. {code} // New plan is absolutely better than old plan. call.getPlanner().setImportance(project, 0.0); {code} Here is the query from the test, I expect the sum to be folded in planning with the newly enabled project constant folding rule. {code} select columns[0], 3+5 from cp.`test_input.csv` {code} There also some planning bugs that are exposed when this rule is enabled, even if the ReduceExpressionsRule.PROJECT_INSTANCE has no impact on the plan itself. It is causing a planning bug for the TestAggregateFunctions.testDrill2092 -as well as TestProjectPushDown.testProjectPastJoinPastFilterPastJoinPushDown()-. The rule's OnMatch is being called, but not modifying the plan. It seems like its presence in the optimizer is making another rule fire that is creating a bad plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3555) Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail
Abhishek Girish created DRILL-3555: -- Summary: Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail Key: DRILL-3555 URL: https://issues.apache.org/jira/browse/DRILL-3555 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.1.0, 1.2.0 Environment: 4 Nodes. Direct Memory= 48 GB each Reporter: Abhishek Girish Assignee: Jinfeng Ni Priority: Critical Changing the default value for planner.memory.max_query_memory_per_node from 2 GB to anything higher causes queries with window functions to fail. Changed system options {code:sql} select * from sys.options where status like '%CHANGE%'; +---+--+-+--+-+-+---++ | name| kind | type | status | num_val | string_val | bool_val | float_val | +---+--+-+--+-+-+---++ | planner.enable_decimal_data_type | BOOLEAN | SYSTEM | CHANGED | null| null| true | null | | planner.memory.max_query_memory_per_node | LONG | SYSTEM | CHANGED | 8589934592 | null| null | null | +---+--+-+--+-+-+---++ 2 rows selected (0.249 seconds) {code} Query {code:sql} SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) FROM store_sales ss LIMIT 20; java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) at sqlline.SqlLine.print(SqlLine.java:1583) at sqlline.Commands.execute(Commands.java:852) at sqlline.Commands.sql(Commands.java:751) at sqlline.SqlLine.dispatch(SqlLine.java:738) at sqlline.SqlLine.begin(SqlLine.java:612) at sqlline.SqlLine.start(SqlLine.java:366) at sqlline.SqlLine.main(SqlLine.java:259) {code} Log: {code} 2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: State change requested RUNNING -- FINISHED 2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO o.a.d.e.w.f.FragmentStatusReporter - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: State to report: FINISHED 2015-07-23 18:17:05,485 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR o.a.d.e.p.i.s.SortRecordBatchBuilder - Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 2015-07-23 18:17:05,486 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: State change requested RUNNING -- FAILED ... 2015-07-23 18:17:05,990 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: State change requested FAILED -- FINISHED 2015-07-23 18:17:05,999 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523) ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:178) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292)