[jira] [Created] (DRILL-3563) Type confusion and number formatting exceptions
Stefán Baxter created DRILL-3563: Summary: Type confusion and number formatting exceptions Key: DRILL-3563 URL: https://issues.apache.org/jira/browse/DRILL-3563 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.1.0 Reporter: Stefán Baxter Assignee: Jinfeng Ni It seems that null values can trigger a column to be treated as a numeric one, in expressions evaluation, regardless of content or other indicators and that fields in substructures can affect same-named-fields in parent structure. (1.2-SNAPSHOT, parquet files) I have JSON data that can be reduced to to this: {occurred_at:2015-07-26 08:45:41.234,type:plan.item.added,dimensions:{type:null,dim_type:Unspecified,category:Unspecified,sub_category:null}} {occurred_at:2015-07-26 08:45:43.598,type:plan.item.removed,dimensions:{type:Unspecified,dim_type:null,category:Unspecified,sub_category:null}} {occurred_at:2015-07-26 08:45:44.241,type:plan.item.removed,dimensions:{type:To See,category:Nature,sub_category:Waterfalls}} * notice the discrepancy in the dimensions structure that the type field is either called type or dim_type (slightly relevant for the rest of this case) 1. Query where dimensions are not involved select p.type, count(*) from dfs.tmp.`/analytics/processed/some-tenant/events` as p where occurred_at '2015-07-26' and p.type in ('plan.item.added','plan.item.removed') group by p.type; ++-+ |type| EXPR$1 | ++-+ | plan.item.removed | 947 | | plan.item.added| 40342 | ++-+ 2 rows selected (0.508 seconds) 2. Same query but involves dimension.type as well select p.type, coalesce(p.dimensions.dim_type, p.dimensions.type) dimensions_type, count(*) from dfs.tmp.`/analytics/processed/some-tenant/events` as p where occurred_at '2015-07-26' and p.type in ('plan.item.added','plan.item.removed') group by p.type, coalesce(p.dimensions.dim_type, p.dimensions.type); Error: SYSTEM ERROR: NumberFormatException: To See Fragment 2:0 [Error Id: 4756f549-cc47-43e5-899e-10a11efb60ea on localhost:31010] (state=,code=0) I can provide test data if this is not enough to reproduce this bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array
[ https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Deegan updated DRILL-3562: - Description: Drill query fails when using flatten when some records contain an empty array {noformat} SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) flat WHERE flat.c.d.e = 'f' limit 1; {noformat} Succeeds on { a: { b: { c: [ { d: { e: f } } ] } } } Fails on { a: { b: { c: [] } } } Error {noformat} Error: SYSTEM ERROR: ClassCastException: Cannot cast org.apache.drill.exec.vector.NullableIntVector to org.apache.drill.exec.vector.complex.RepeatedValueVector {noformat} Is it possible to ignore the empty arrays, or do they need to be populated with dummy data? was: Drill query fails when using flatten when some records contain an empty array SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) flat WHERE flat.c.d.e = 'f' limit 1; Succeeds on { a: { b: { c: [ { d: { e: f } } ] } } } Fails on { a: { b: { c: [] } } } Error {noformat} Error: SYSTEM ERROR: ClassCastException: Cannot cast org.apache.drill.exec.vector.NullableIntVector to org.apache.drill.exec.vector.complex.RepeatedValueVector {noformat} Is it possible to ignore the empty arrays, or do they need to be populated with dummy data? Query fails when using flatten on JSON data where some documents have an empty array Key: DRILL-3562 URL: https://issues.apache.org/jira/browse/DRILL-3562 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 1.1.0 Reporter: Philip Deegan Assignee: Steven Phillips Drill query fails when using flatten when some records contain an empty array {noformat} SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) flat WHERE flat.c.d.e = 'f' limit 1; {noformat} Succeeds on { a: { b: { c: [ { d: { e: f } } ] } } } Fails on { a: { b: { c: [] } } } Error {noformat} Error: SYSTEM ERROR: ClassCastException: Cannot cast org.apache.drill.exec.vector.NullableIntVector to org.apache.drill.exec.vector.complex.RepeatedValueVector {noformat} Is it possible to ignore the empty arrays, or do they need to be populated with dummy data? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3412) Projections are not getting push down below Window operator
[ https://issues.apache.org/jira/browse/DRILL-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642849#comment-14642849 ] Deneche A. Hakim commented on DRILL-3412: - Interestingly, adding a constant in one of the window functions will push the projections below the Window operator: Query without constant: {noformat} 0: jdbc:drill:zk=local explain plan for SELECT RANK() OVER (PARTITION BY ss.ss_store_sk ORDER BY ss.ss_store_sk) FROM store_sales ss LIMIT 20; 00-00Screen 00-01 Project(EXPR$0=[$0]) 00-02SelectionVectorRemover 00-03 Limit(fetch=[20]) 00-04UnionExchange 01-01 Project(w0$o0=[$2]) 01-02Window(window#0=[window(partition {1} order by [1] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [RANK()])]) 01-03 SelectionVectorRemover 01-04Sort(sort0=[$1], sort1=[$1], dir0=[ASC], dir1=[ASC]) 01-05 Project(T3¦¦*=[$0], ss_store_sk=[$1]) 01-06HashToRandomExchange(dist0=[[$1]]) 02-01 UnorderedMuxExchange 03-01Project(T3¦¦*=[$0], ss_store_sk=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1))]) 03-02 Project(T3¦¦*=[$0], ss_store_sk=[$1]) 03-03Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/hakim/MapR/data/tpcds100/parquet/store_sales]], selectionRoot=file:/Users/hakim/MapR/data/tpcds100/parquet/store_sales, numFiles=1, columns=[`*`]]]) {noformat} Query with constant in ORDER BY clause (the query is still the same because we were ordering on the partition clause): {noformat} 0: jdbc:drill:zk=local explain plan for SELECT RANK() OVER (PARTITION BY ss.ss_store_sk ORDER BY 1) FROM store_sales ss LIMIT 20; 00-00Screen 00-01 Project(EXPR$0=[$0]) 00-02SelectionVectorRemover 00-03 Limit(fetch=[20]) 00-04UnionExchange 01-01 Project($0=[$1]) 01-02Window(window#0=[window(partition {0} order by [] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [RANK()])]) 01-03 SelectionVectorRemover 01-04Sort(sort0=[$0], dir0=[ASC]) 01-05 Project(ss_store_sk=[$0]) 01-06HashToRandomExchange(dist0=[[$0]]) 02-01 UnorderedMuxExchange 03-01Project(ss_store_sk=[$0], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))]) 03-02 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/hakim/MapR/data/tpcds100/parquet/store_sales]], selectionRoot=file:/Users/hakim/MapR/data/tpcds100/parquet/store_sales, numFiles=1, columns=[`ss_store_sk`]]]) {noformat} Query with a constant in a different window function COUNT(1): {noformat} 0: jdbc:drill:zk=local explain plan for SELECT COUNT(1) OVER(PARTITION BY ss.ss_store_sk ORDER BY ss.ss_store_sk), RANK() OVER (PARTITION BY ss.ss_store_sk ORDER BY ss.ss_store_sk) FROM store_sales ss LIMIT 20; 00-00Screen 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) 00-02SelectionVectorRemover 00-03 Limit(fetch=[20]) 00-04UnionExchange 01-01 Project($0=[$1], $1=[$2]) 01-02Window(window#0=[window(partition {0} order by [0] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($1), RANK()])]) 01-03 SelectionVectorRemover 01-04Sort(sort0=[$0], sort1=[$0], dir0=[ASC], dir1=[ASC]) 01-05 Project(ss_store_sk=[$0]) 01-06HashToRandomExchange(dist0=[[$0]]) 02-01 UnorderedMuxExchange 03-01Project(ss_store_sk=[$0], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))]) 03-02 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/hakim/MapR/data/tpcds100/parquet/store_sales]], selectionRoot=file:/Users/hakim/MapR/data/tpcds100/parquet/store_sales, numFiles=1, columns=[`ss_store_sk`]]]) {noformat} Projections are not getting push down below Window operator --- Key: DRILL-3412 URL: https://issues.apache.org/jira/browse/DRILL-3412 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Aman Sinha Assignee: Jinfeng Ni Priority: Blocker Labels: window_function Fix For: 1.2.0 The plan below shows that the 'star' column is being produced by the Scan and subsequent Project. This indicates projection pushdown is not working as desired when window function is present. The query produces
[jira] [Updated] (DRILL-3412) Projections are not getting push down below Window operator
[ https://issues.apache.org/jira/browse/DRILL-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim updated DRILL-3412: Labels: window_function (was: ) Projections are not getting push down below Window operator --- Key: DRILL-3412 URL: https://issues.apache.org/jira/browse/DRILL-3412 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Aman Sinha Assignee: Jinfeng Ni Priority: Blocker Labels: window_function Fix For: 1.2.0 The plan below shows that the 'star' column is being produced by the Scan and subsequent Project. This indicates projection pushdown is not working as desired when window function is present. The query produces correct results. {code} explain plan for select min(n_nationkey) over (partition by n_regionkey) from cp.`tpch/nation.parquet` ; 00-00Screen 00-01 Project(EXPR$0=[$0]) 00-02Project(w0$o0=[$3]) 00-03 Window(window#0=[window(partition {2} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [MIN($1)])]) 00-04SelectionVectorRemover 00-05 Sort(sort0=[$2], dir0=[ASC]) 00-06Project(T1¦¦*=[$0], n_nationkey=[$1], n_regionkey=[$2]) 00-07 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`*`]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3151) ResultSetMetaData not as specified by JDBC (null/dummy value, not /etc.)
[ https://issues.apache.org/jira/browse/DRILL-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) updated DRILL-3151: -- Assignee: Mehant Baid (was: Parth Chandra) ResultSetMetaData not as specified by JDBC (null/dummy value, not /etc.) -- Key: DRILL-3151 URL: https://issues.apache.org/jira/browse/DRILL-3151 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Reporter: Daniel Barclay (Drill) Assignee: Mehant Baid Fix For: 1.2.0 Attachments: DRILL-3151.3.patch.txt In Drill's JDBC driver, some ResultSetMetaData methods don't return what JDBC specifies they should return. Some cases: {{getTableName(int)}}: - (JDBC says: {{table name or if not applicable}}) - Drill returns {{null}} (instead of empty string or table name) - (Drill indicates not applicable even when from named table, e.g., for {{SELECT * FROM INFORMATION_SCHEMA.CATALOGS}}.) {{getSchemaName(int)}}: - (JDBC says: {{schema name or if not applicable}}) - Drill returns {{\-\-UNKNOWN--}} (instead of empty string or schema name) - (Drill indicates not applicable even when from named table, e.g., for {{SELECT * FROM INFORMATION_SCHEMA.CATALOGS}}.) {{getCatalogName(int)}}: - (JDBC says: {{the name of the catalog for the table in which the given column appears or if not applicable}}) - Drill returns {{\-\-UNKNOWN--}} (instead of empty string or catalog name) - (Drill indicates not applicable even when from named table, e.g., for {{SELECT * FROM INFORMATION_SCHEMA.CATALOGS}}.) {{isSearchable(int)}}: - (JDBC says: {{Indicates whether the designated column can be used in a where clause.}}) - Drill returns {{false}}. {{getColumnClassName(int}}: - (JDBC says: {{the fully-qualified name of the class in the Java programming language that would be used by the method ResultSet.getObject to retrieve the value in the specified column. This is the class name used for custom mapping.}}) - Drill returns {{none}} (instead of the correct class name). More cases: {{getColumnDisplaySize}} - (JDBC says (quite ambiguously): {{the normal maximum number of characters allowed as the width of the designated column}}) - Drill always returns {{10}}! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2818) Error message must be updated when query fails with FileNotFoundException
[ https://issues.apache.org/jira/browse/DRILL-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643027#comment-14643027 ] ASF GitHub Bot commented on DRILL-2818: --- Github user dsbos commented on the pull request: https://github.com/apache/drill/pull/93#issuecomment-125272213 (More final added in already touched methods.) Error message must be updated when query fails with FileNotFoundException - Key: DRILL-2818 URL: https://issues.apache.org/jira/browse/DRILL-2818 Project: Apache Drill Issue Type: Bug Components: SQL Parser Affects Versions: 0.9.0 Environment: exception branch Reporter: Abhishek Girish Assignee: Deneche A. Hakim Priority: Minor Labels: error_message_must_fix Fix For: 1.3.0 When user specifies a non-existent file/directory in a query, the following error is being thrown: {code:sql} show files from dfs.tmp.`tpch`; Query failed: SYSTEM ERROR: Failure handling SQL. [9184097e-8339-42d3-96ce-1fba51c6bc78 on 192.168.158.107:31010] Error: exception while executing query: Failure while executing query. (state=,code=0) {code} This should be updated to {code:sql} show files from dfs.tmp.`tpch`; Query failed: File /tmp/tpch does not exist. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3564) Error message fix required
Khurram Faraaz created DRILL-3564: - Summary: Error message fix required Key: DRILL-3564 URL: https://issues.apache.org/jira/browse/DRILL-3564 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Reporter: Khurram Faraaz Assignee: Chris Westin Priority: Minor We report Union-All in the error message, we should say Union as the query involves a UNION and not Union-All. {code} 0: jdbc:drill:schema=dfs.tmp select * from union_01 UNION select * from union_02; Error: UNSUPPORTED_OPERATION ERROR: Union-All over schema-less tables must specify the columns explicitly See Apache Drill JIRA: DRILL-2414 [Error Id: 760e48d8-ffac-4d5f-ac14-25aabcfd8033 on centos-04.qa.lab:31010] (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3564) Error message fix required
[ https://issues.apache.org/jira/browse/DRILL-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Khurram Faraaz updated DRILL-3564: -- Assignee: Sean Hsuan-Yi Chu (was: Khurram Faraaz) Error message fix required -- Key: DRILL-3564 URL: https://issues.apache.org/jira/browse/DRILL-3564 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Reporter: Khurram Faraaz Assignee: Sean Hsuan-Yi Chu Priority: Minor We report Union-All in the error message, we should say Union as the query involves a UNION and not Union-All. {code} 0: jdbc:drill:schema=dfs.tmp select * from union_01 UNION select * from union_02; Error: UNSUPPORTED_OPERATION ERROR: Union-All over schema-less tables must specify the columns explicitly See Apache Drill JIRA: DRILL-2414 [Error Id: 760e48d8-ffac-4d5f-ac14-25aabcfd8033 on centos-04.qa.lab:31010] (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-3564) Error message fix required
[ https://issues.apache.org/jira/browse/DRILL-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Khurram Faraaz reassigned DRILL-3564: - Assignee: Khurram Faraaz (was: Chris Westin) Error message fix required -- Key: DRILL-3564 URL: https://issues.apache.org/jira/browse/DRILL-3564 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Reporter: Khurram Faraaz Assignee: Khurram Faraaz Priority: Minor We report Union-All in the error message, we should say Union as the query involves a UNION and not Union-All. {code} 0: jdbc:drill:schema=dfs.tmp select * from union_01 UNION select * from union_02; Error: UNSUPPORTED_OPERATION ERROR: Union-All over schema-less tables must specify the columns explicitly See Apache Drill JIRA: DRILL-2414 [Error Id: 760e48d8-ffac-4d5f-ac14-25aabcfd8033 on centos-04.qa.lab:31010] (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3364) Prune scan range if the filter is on the leading field with byte comparable encoding
[ https://issues.apache.org/jira/browse/DRILL-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643119#comment-14643119 ] Aditya Kishore commented on DRILL-3364: --- {noformat} diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/conv/IntBEConvertFrom.java b/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/conv/IntBEConvertFrom.java index 177ae52..785b2e3 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/conv/IntBEConvertFrom.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/conv/IntBEConvertFrom.java @@ -26,7 +26,8 @@ import org.apache.drill.exec.expr.annotations.Param; import org.apache.drill.exec.expr.holders.IntHolder; import org.apache.drill.exec.expr.holders.VarBinaryHolder; -@FunctionTemplate(name = convert_fromINT_BE, scope = FunctionScope.SIMPLE, nulls = NullHandling.NULL_IF_NULL) +@FunctionTemplate(names = {convert_fromINT_BE, convert_fromUINT4_BE}, +scope = FunctionScope.SIMPLE, nulls = NullHandling.NULL_IF_NULL) public class IntBEConvertFrom implements DrillSimpleFunc { @Param VarBinaryHolder in; {noformat} Instead if add an alias for UINT4_BE which confusingly returns a signed int, could you please add new function like UInt8ConvertFrom and UInt8ConvertTo. Rest looks good. Prune scan range if the filter is on the leading field with byte comparable encoding Key: DRILL-3364 URL: https://issues.apache.org/jira/browse/DRILL-3364 Project: Apache Drill Issue Type: Sub-task Components: Storage - HBase Reporter: Aditya Kishore Assignee: Smidth Panchamia Fix For: 1.2.0 Attachments: 0001-Add-convert_from-and-convert_to-methods-for-TIMESTAM.patch, 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch, 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch, 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch, composite.jun26.diff -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2815) Some PathScanner logging, misc. cleanup.
[ https://issues.apache.org/jira/browse/DRILL-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643041#comment-14643041 ] Daniel Barclay (Drill) commented on DRILL-2815: --- Edited pull request title, but it didn't link in DRILL-2815. (Do you have any idea where the specification of the syntax to follow to make the JIRA report reference recognizable is?) Some PathScanner logging, misc. cleanup. Key: DRILL-2815 URL: https://issues.apache.org/jira/browse/DRILL-2815 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Assignee: Jason Altekruse Priority: Minor Fix For: 1.2.0 Attachments: DRILL-2815.5.patch.txt, DRILL-2815.6.patch.txt Add a little more logging to PathScanner; clean up a little. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2818) Error message must be updated when query fails with FileNotFoundException
[ https://issues.apache.org/jira/browse/DRILL-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643043#comment-14643043 ] ASF GitHub Bot commented on DRILL-2818: --- Github user jaltekruse commented on the pull request: https://github.com/apache/drill/pull/93#issuecomment-125274722 +1 Error message must be updated when query fails with FileNotFoundException - Key: DRILL-2818 URL: https://issues.apache.org/jira/browse/DRILL-2818 Project: Apache Drill Issue Type: Bug Components: SQL Parser Affects Versions: 0.9.0 Environment: exception branch Reporter: Abhishek Girish Assignee: Deneche A. Hakim Priority: Minor Labels: error_message_must_fix Fix For: 1.3.0 When user specifies a non-existent file/directory in a query, the following error is being thrown: {code:sql} show files from dfs.tmp.`tpch`; Query failed: SYSTEM ERROR: Failure handling SQL. [9184097e-8339-42d3-96ce-1fba51c6bc78 on 192.168.158.107:31010] Error: exception while executing query: Failure while executing query. (state=,code=0) {code} This should be updated to {code:sql} show files from dfs.tmp.`tpch`; Query failed: File /tmp/tpch does not exist. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3364) Prune scan range if the filter is on the leading field with byte comparable encoding
[ https://issues.apache.org/jira/browse/DRILL-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Smidth Panchamia updated DRILL-3364: Attachment: 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch Combined patch attached. Prune scan range if the filter is on the leading field with byte comparable encoding Key: DRILL-3364 URL: https://issues.apache.org/jira/browse/DRILL-3364 Project: Apache Drill Issue Type: Sub-task Components: Storage - HBase Reporter: Aditya Kishore Assignee: Smidth Panchamia Fix For: 1.2.0 Attachments: 0001-Add-convert_from-and-convert_to-methods-for-TIMESTAM.patch, 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch, 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch, 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch, composite.jun26.diff -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3502) JDBC driver can cause conflicts
[ https://issues.apache.org/jira/browse/DRILL-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Parth Chandra updated DRILL-3502: - Fix Version/s: 1.2.0 JDBC driver can cause conflicts --- Key: DRILL-3502 URL: https://issues.apache.org/jira/browse/DRILL-3502 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Affects Versions: 1.1.0 Reporter: Stefán Baxter Assignee: Daniel Barclay (Drill) Fix For: 1.2.0 Using the JDBC driver in Java projects is problematic as it contains older versions of some popular libraries and since they are not isolated/shaded they may conflict with newer versions being used in these projects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2818) Error message must be updated when query fails with FileNotFoundException
[ https://issues.apache.org/jira/browse/DRILL-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643026#comment-14643026 ] ASF GitHub Bot commented on DRILL-2818: --- Github user dsbos commented on a diff in the pull request: https://github.com/apache/drill/pull/93#discussion_r35559349 --- Diff: common/src/main/java/org/apache/drill/common/logical/FormatPluginConfigBase.java --- @@ -27,11 +27,20 @@ static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(FormatPluginConfigBase.class); - public synchronized static Class?[] getSubTypes(DrillConfig config){ -ListString packages = config.getStringList(CommonConstants.STORAGE_PLUGIN_CONFIG_SCAN_PACKAGES); -Class?[] sec = PathScanner.scanForImplementationsArr(FormatPluginConfig.class, packages); -logger.debug(Adding Format Plugin Configs including {}, (Object) sec ); -return sec; + public synchronized static Class?[] getSubTypes(DrillConfig config) { +ListString packages = + config.getStringList(CommonConstants.STORAGE_PLUGIN_CONFIG_SCAN_PACKAGES); +Class?[] pluginClasses = +PathScanner.scanForImplementationsArr(FormatPluginConfig.class, packages); +if (logger.isDebugEnabled()) { + final StringBuilder sb = new StringBuilder(); --- End diff -- Reduce line count by factoring out part that Joiner could be used for. Error message must be updated when query fails with FileNotFoundException - Key: DRILL-2818 URL: https://issues.apache.org/jira/browse/DRILL-2818 Project: Apache Drill Issue Type: Bug Components: SQL Parser Affects Versions: 0.9.0 Environment: exception branch Reporter: Abhishek Girish Assignee: Deneche A. Hakim Priority: Minor Labels: error_message_must_fix Fix For: 1.3.0 When user specifies a non-existent file/directory in a query, the following error is being thrown: {code:sql} show files from dfs.tmp.`tpch`; Query failed: SYSTEM ERROR: Failure handling SQL. [9184097e-8339-42d3-96ce-1fba51c6bc78 on 192.168.158.107:31010] Error: exception while executing query: Failure while executing query. (state=,code=0) {code} This should be updated to {code:sql} show files from dfs.tmp.`tpch`; Query failed: File /tmp/tpch does not exist. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3565) Reading an avro file throws UnsupportedOperationException: Unimplemented type: UNION
[ https://issues.apache.org/jira/browse/DRILL-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-3565: --- Attachment: drillbit.log.txt divolte.avro Reading an avro file throws UnsupportedOperationException: Unimplemented type: UNION Key: DRILL-3565 URL: https://issues.apache.org/jira/browse/DRILL-3565 Project: Apache Drill Issue Type: Bug Components: Storage - Other Affects Versions: 1.1.0, 1.2.0 Reporter: Abhishek Girish Assignee: Jacques Nadeau Attachments: divolte.avro, drillbit.log.txt Running a simple select * from an avro file fails. {code:sql} select count(*) from `divolte.avro`; Error: SYSTEM ERROR: UnsupportedOperationException: Unimplemented type: UNION Fragment 0:0 [Error Id: c7c1ed87-cd85-4146-844d-4addc227128b on abhi1:31010] (state=,code=0) {code} Plan: {code} 00-00Screen 00-01 Project(*=[$0]) 00-02Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/divolte.avro, numFiles=1, columns=[`*`], files=[maprfs:///tmp/divolte.avro]]]) {code} Log data file attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3565) Add support for Avro UNION type
[ https://issues.apache.org/jira/browse/DRILL-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-3565: --- Issue Type: Improvement (was: Bug) Add support for Avro UNION type --- Key: DRILL-3565 URL: https://issues.apache.org/jira/browse/DRILL-3565 Project: Apache Drill Issue Type: Improvement Components: Storage - Other Affects Versions: 1.1.0, 1.2.0 Reporter: Abhishek Girish Assignee: Jacques Nadeau Attachments: divolte.avro, drillbit.log.txt Running a simple select * from an avro file fails. {code:sql} select count(*) from `divolte.avro`; Error: SYSTEM ERROR: UnsupportedOperationException: Unimplemented type: UNION Fragment 0:0 [Error Id: c7c1ed87-cd85-4146-844d-4addc227128b on abhi1:31010] (state=,code=0) {code} Plan: {code} 00-00Screen 00-01 Project(*=[$0]) 00-02Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/divolte.avro, numFiles=1, columns=[`*`], files=[maprfs:///tmp/divolte.avro]]]) {code} Log data file attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3565) Add support for Avro UNION type
[ https://issues.apache.org/jira/browse/DRILL-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-3565: --- Summary: Add support for Avro UNION type (was: Reading an avro file throws UnsupportedOperationException: Unimplemented type: UNION) Add support for Avro UNION type --- Key: DRILL-3565 URL: https://issues.apache.org/jira/browse/DRILL-3565 Project: Apache Drill Issue Type: Bug Components: Storage - Other Affects Versions: 1.1.0, 1.2.0 Reporter: Abhishek Girish Assignee: Jacques Nadeau Attachments: divolte.avro, drillbit.log.txt Running a simple select * from an avro file fails. {code:sql} select count(*) from `divolte.avro`; Error: SYSTEM ERROR: UnsupportedOperationException: Unimplemented type: UNION Fragment 0:0 [Error Id: c7c1ed87-cd85-4146-844d-4addc227128b on abhi1:31010] (state=,code=0) {code} Plan: {code} 00-00Screen 00-01 Project(*=[$0]) 00-02Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/divolte.avro, numFiles=1, columns=[`*`], files=[maprfs:///tmp/divolte.avro]]]) {code} Log data file attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3554) Union over TIME and TIMESTAMP values throws SchemaChangeException
[ https://issues.apache.org/jira/browse/DRILL-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643212#comment-14643212 ] Khurram Faraaz commented on DRILL-3554: --- Adding this observation here, this may be related to this issue. CAST from TIME to TIMESTAMP does not work on Drill column c9 is of TIME type. {code} 0: jdbc:drill:schema=dfs.tmp select cast(c9 as TIMESTAMP) from union_01; Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: 08:16:08.580 is malformed at :16:08.580 Fragment 0:0 [Error Id: 8d6a7a34-d857-4176-83a8-42328793bf07 on centos-02.qa.lab:31010] (state=,code=0) {code} Stack trace from drillbit.log {code} 2015-07-27 18:45:20,350 [2a4983be-8d2f-eb15-99b1-eb92c127678e:frag:0:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalArgumentException: Invalid format: 08:16:08.580 is malformed at :16:08.580 Fragment 0:0 [Error Id: 8d6a7a34-d857-4176-83a8-42328793bf07 on centos-02.qa.lab:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: IllegalArgumentException: Invalid format: 08:16:08.580 is malformed at :16:08.580 Fragment 0:0 [Error Id: 8d6a7a34-d857-4176-83a8-42328793bf07 on centos-02.qa.lab:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523) ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:178) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] Caused by: java.lang.IllegalArgumentException: Invalid format: 08:16:08.580 is malformed at :16:08.580 at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:873) ~[joda-time-2.3.jar:2.3] at org.joda.time.DateTime.parse(DateTime.java:144) ~[joda-time-2.3.jar:2.3] at org.apache.drill.exec.test.generated.ProjectorGen1632.doEval(ProjectorTemplate.java:120) ~[na:na] at org.apache.drill.exec.test.generated.ProjectorGen1632.projectRecords(ProjectorTemplate.java:62) ~[na:na] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:172) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:79) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:258) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:252) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_45] at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_45] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) ~[hadoop-common-2.5.1-mapr-1503.jar:na] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:252) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] ... 4 common frames omitted {code} Union over TIME and TIMESTAMP values throws SchemaChangeException - Key: DRILL-3554 URL: https://issues.apache.org/jira/browse/DRILL-3554 Project:
[jira] [Created] (DRILL-3565) Reading an avro file throws UnsupportedOperationException: Unimplemented type: UNION
Abhishek Girish created DRILL-3565: -- Summary: Reading an avro file throws UnsupportedOperationException: Unimplemented type: UNION Key: DRILL-3565 URL: https://issues.apache.org/jira/browse/DRILL-3565 Project: Apache Drill Issue Type: Bug Components: Storage - Other Affects Versions: 1.1.0, 1.2.0 Reporter: Abhishek Girish Assignee: Jacques Nadeau Running a simple select * from an avro file fails. {code:sql} select count(*) from `divolte.avro`; Error: SYSTEM ERROR: UnsupportedOperationException: Unimplemented type: UNION Fragment 0:0 [Error Id: c7c1ed87-cd85-4146-844d-4addc227128b on abhi1:31010] (state=,code=0) {code} Plan: {code} 00-00Screen 00-01 Project(*=[$0]) 00-02Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/divolte.avro, numFiles=1, columns=[`*`], files=[maprfs:///tmp/divolte.avro]]]) {code} Log data file attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3565) Add support for Avro UNION type
[ https://issues.apache.org/jira/browse/DRILL-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643245#comment-14643245 ] Abhishek Girish commented on DRILL-3565: Turns out I confused this with UNION operator. Have updated the title suitably. Add support for Avro UNION type --- Key: DRILL-3565 URL: https://issues.apache.org/jira/browse/DRILL-3565 Project: Apache Drill Issue Type: Improvement Components: Storage - Other Affects Versions: 1.1.0, 1.2.0 Reporter: Abhishek Girish Assignee: Jacques Nadeau Attachments: divolte.avro, drillbit.log.txt Running a simple select * from an avro file fails. {code:sql} select count(*) from `divolte.avro`; Error: SYSTEM ERROR: UnsupportedOperationException: Unimplemented type: UNION Fragment 0:0 [Error Id: c7c1ed87-cd85-4146-844d-4addc227128b on abhi1:31010] (state=,code=0) {code} Plan: {code} 00-00Screen 00-01 Project(*=[$0]) 00-02Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/divolte.avro, numFiles=1, columns=[`*`], files=[maprfs:///tmp/divolte.avro]]]) {code} Log data file attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3537) Empty Json file can potentially result into wrong results
[ https://issues.apache.org/jira/browse/DRILL-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643423#comment-14643423 ] Parth Chandra commented on DRILL-3537: -- LGTM. +1 Empty Json file can potentially result into wrong results -- Key: DRILL-3537 URL: https://issues.apache.org/jira/browse/DRILL-3537 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators, Storage - JSON Reporter: Sean Hsuan-Yi Chu Assignee: Parth Chandra Priority: Critical Fix For: 1.2.0 In the directory, we have two files. One has some data and the other one is empty. A query as below: {code} select * from dfs.`directory`; {code} will produce different results according to the order of the files being read (The default order is in the alphabetic order of the filenames). To give a more concrete example, the non-empty json has data: {code} { a:1 } {code} By naming the files, you can control the orders. If the empty file is read in firstly, the result is {code} +---++ | * | a | +---++ | null | 1 | +---++ {code} If the opposite order takes place, the result is {code} ++ | a | ++ | 1 | | 2 | ++ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3566) Calling Connection.prepareStatement throws a ClassCastException
Rahul Challapalli created DRILL-3566: Summary: Calling Connection.prepareStatement throws a ClassCastException Key: DRILL-3566 URL: https://issues.apache.org/jira/browse/DRILL-3566 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Reporter: Rahul Challapalli Assignee: Daniel Barclay (Drill) Priority: Blocker Git Commit # : 65935db8d01b95a7a3107835d7cd5e61220e2f84 I am hitting the below exception when using Connection.prepareStatement without binding any parameters {code} PreparedStatement stmt = con.prepareStatement(DRILL_SAMPLE_QUERY); Exception in thread main java.lang.ClassCastException: org.apache.drill.jdbc.impl.DrillJdbc41Factory$DrillJdbc41PreparedStatement cannot be cast to org.apache.drill.jdbc.impl.DrillStatementImpl at org.apache.drill.jdbc.impl.DrillJdbc41Factory.newResultSet(DrillJdbc41Factory.java:106) at org.apache.drill.jdbc.impl.DrillJdbc41Factory.newResultSet(DrillJdbc41Factory.java:1) at net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:397) at net.hydromatic.avatica.AvaticaPreparedStatement.executeQuery(AvaticaPreparedStatement.java:77) at com.incorta.trails.DrillTest.query(DrillTest.java:33) at com.incorta.trails.DrillTest.main(DrillTest.java:12) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3364) Prune scan range if the filter is on the leading field with byte comparable encoding
[ https://issues.apache.org/jira/browse/DRILL-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Smidth Panchamia updated DRILL-3364: Attachment: 0001-PATCH-DRILL-3364-Prune-scan-range-if-the-filter-is-o.patch Attached a new combined patch after updating as per Aditya's comments. Prune scan range if the filter is on the leading field with byte comparable encoding Key: DRILL-3364 URL: https://issues.apache.org/jira/browse/DRILL-3364 Project: Apache Drill Issue Type: Sub-task Components: Storage - HBase Reporter: Aditya Kishore Assignee: Smidth Panchamia Fix For: 1.2.0 Attachments: 0001-Add-convert_from-and-convert_to-methods-for-TIMESTAM.patch, 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch, 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch, 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch, 0001-PATCH-DRILL-3364-Prune-scan-range-if-the-filter-is-o.patch, composite.jun26.diff -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3567) Wrong result in a query with multiple window functions and different over clauses
[ https://issues.apache.org/jira/browse/DRILL-3567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victoria Markman updated DRILL-3567: Attachment: t1_parquet Wrong result in a query with multiple window functions and different over clauses - Key: DRILL-3567 URL: https://issues.apache.org/jira/browse/DRILL-3567 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.1.0 Environment: private-branch-with-multiple-partitions-enabled Reporter: Victoria Markman Assignee: Jinfeng Ni Priority: Critical Labels: window_function Attachments: t1_parquet {code} 0: jdbc:drill:drillbit=localhost select * from t1; +---++-+ | a1 | b1 | c1 | +---++-+ | 1 | a | 2015-01-01 | | 2 | b | 2015-01-02 | | 3 | c | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | e | 2015-01-05 | | 6 | f | 2015-01-06 | | 7 | g | 2015-01-07 | | null | h | 2015-01-08 | | 9 | i | null| | 10| j | 2015-01-10 | +---++-+ 10 rows selected (0.078 seconds) {code} Wrong result, columns are projected in the wrong order: {code} 0: jdbc:drill:drillbit=localhost select . . . . . . . . . . . . . . . . count(*) over(partition by b1 order by c1) as count1, . . . . . . . . . . . . . . . . count(*) over(partition by a1 order by c1) as count2, . . . . . . . . . . . . . . . . sum(a1) over(partition by b1 order by c1) as sum1 . . . . . . . . . . . . . . . . from . . . . . . . . . . . . . . . . t1; +-+-+---+ | count1 | count2 | sum1 | +-+-+---+ | 1 | 1 | 1 | | 1 | 2 | 1 | | 1 | 3 | 1 | | 1 | 4 | 1 | | 1 | 5 | 1 | | 1 | 6 | 1 | | 1 | 7 | 1 | | 1 | 9 | 1 | | 1 | 10 | 1 | | 1 | null| 1 | +-+-+---+ 10 rows selected (0.113 seconds) {code} Explain plan: {code} 0: jdbc:drill:drillbit=localhost explain plan for select . . . . . . . . . . . . . . . . count(*) over(partition by b1 order by c1) as count1, . . . . . . . . . . . . . . . . count(*) over(partition by a1 order by c1) as count2, . . . . . . . . . . . . . . . . sum(a1) over(partition by b1 order by c1) as sum1 . . . . . . . . . . . . . . . . from . . . . . . . . . . . . . . . . t1; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 ProjectAllowDup(count1=[$0], count2=[$1], sum1=[$2]) 00-02Project(w0$o0=[$4], w0$o1=[$5], w1$o0=[$6]) 00-03 Window(window#0=[window(partition {3} order by [2] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT()])]) 00-04SelectionVectorRemover 00-05 Sort(sort0=[$3], sort1=[$2], dir0=[ASC], dir1=[ASC]) 00-06Window(window#0=[window(partition {1} order by [2] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT(), SUM($3)])]) 00-07 SelectionVectorRemover 00-08Sort(sort0=[$1], sort1=[$2], dir0=[ASC], dir1=[ASC]) 00-09 Project(T61¦¦*=[$0], b1=[$1], c1=[$2], a1=[$3]) 00-10Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/vmarkman/drill/testdata/subqueries/t1]], selectionRoot=file:/Users/vmarkman/drill/testdata/subqueries/t1, numFiles=1, columns=[`*`]]]) {code} If you remove frame that is not the same as other two, query works correctly: {code} 0: jdbc:drill:drillbit=localhost select . . . . . . . . . . . . . . . . count(*) over(partition by b1 order by c1) as count1, . . . . . . . . . . . . . . . . sum(a1) over(partition by b1 order by c1) as sum1 . . . . . . . . . . . . . . . . from . . . . . . . . . . . . . . . . t1; +-+---+ | count1 | sum1 | +-+---+ | 1 | 1 | | 1 | 2 | | 1 | 3 | | 1 | 5 | | 1 | 6 | | 1 | 7 | | 1 | null | | 1 | 9 | | 1 | 10| | 1 | 4 | +-+---+ 10 rows selected (0.099 seconds) {code} and in the different order (just for fun) : {code} 0: jdbc:drill:drillbit=localhost select . . . . . . . . . . . . . . . . sum(a1) over(partition by b1 order by c1) as sum1, . . . . . . . . . . . . . . . . count(*) over(partition by b1 order by c1) as count1 . . . . . . . . . . . . .
[jira] [Created] (DRILL-3567) Wrong result in a query with multiple window functions and different over clauses
Victoria Markman created DRILL-3567: --- Summary: Wrong result in a query with multiple window functions and different over clauses Key: DRILL-3567 URL: https://issues.apache.org/jira/browse/DRILL-3567 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.1.0 Environment: private-branch-with-multiple-partitions-enabled Reporter: Victoria Markman Assignee: Jinfeng Ni Priority: Critical {code} 0: jdbc:drill:drillbit=localhost select * from t1; +---++-+ | a1 | b1 | c1 | +---++-+ | 1 | a | 2015-01-01 | | 2 | b | 2015-01-02 | | 3 | c | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | e | 2015-01-05 | | 6 | f | 2015-01-06 | | 7 | g | 2015-01-07 | | null | h | 2015-01-08 | | 9 | i | null| | 10| j | 2015-01-10 | +---++-+ 10 rows selected (0.078 seconds) {code} Wrong result, columns are projected in the wrong order: {code} 0: jdbc:drill:drillbit=localhost select . . . . . . . . . . . . . . . . count(*) over(partition by b1 order by c1) as count1, . . . . . . . . . . . . . . . . count(*) over(partition by a1 order by c1) as count2, . . . . . . . . . . . . . . . . sum(a1) over(partition by b1 order by c1) as sum1 . . . . . . . . . . . . . . . . from . . . . . . . . . . . . . . . . t1; +-+-+---+ | count1 | count2 | sum1 | +-+-+---+ | 1 | 1 | 1 | | 1 | 2 | 1 | | 1 | 3 | 1 | | 1 | 4 | 1 | | 1 | 5 | 1 | | 1 | 6 | 1 | | 1 | 7 | 1 | | 1 | 9 | 1 | | 1 | 10 | 1 | | 1 | null| 1 | +-+-+---+ 10 rows selected (0.113 seconds) {code} Explain plan: {code} 0: jdbc:drill:drillbit=localhost explain plan for select . . . . . . . . . . . . . . . . count(*) over(partition by b1 order by c1) as count1, . . . . . . . . . . . . . . . . count(*) over(partition by a1 order by c1) as count2, . . . . . . . . . . . . . . . . sum(a1) over(partition by b1 order by c1) as sum1 . . . . . . . . . . . . . . . . from . . . . . . . . . . . . . . . . t1; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 ProjectAllowDup(count1=[$0], count2=[$1], sum1=[$2]) 00-02Project(w0$o0=[$4], w0$o1=[$5], w1$o0=[$6]) 00-03 Window(window#0=[window(partition {3} order by [2] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT()])]) 00-04SelectionVectorRemover 00-05 Sort(sort0=[$3], sort1=[$2], dir0=[ASC], dir1=[ASC]) 00-06Window(window#0=[window(partition {1} order by [2] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT(), SUM($3)])]) 00-07 SelectionVectorRemover 00-08Sort(sort0=[$1], sort1=[$2], dir0=[ASC], dir1=[ASC]) 00-09 Project(T61¦¦*=[$0], b1=[$1], c1=[$2], a1=[$3]) 00-10Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/vmarkman/drill/testdata/subqueries/t1]], selectionRoot=file:/Users/vmarkman/drill/testdata/subqueries/t1, numFiles=1, columns=[`*`]]]) {code} If you remove frame that is not the same as other two, query works correctly: {code} 0: jdbc:drill:drillbit=localhost select . . . . . . . . . . . . . . . . count(*) over(partition by b1 order by c1) as count1, . . . . . . . . . . . . . . . . sum(a1) over(partition by b1 order by c1) as sum1 . . . . . . . . . . . . . . . . from . . . . . . . . . . . . . . . . t1; +-+---+ | count1 | sum1 | +-+---+ | 1 | 1 | | 1 | 2 | | 1 | 3 | | 1 | 5 | | 1 | 6 | | 1 | 7 | | 1 | null | | 1 | 9 | | 1 | 10| | 1 | 4 | +-+---+ 10 rows selected (0.099 seconds) {code} and in the different order (just for fun) : {code} 0: jdbc:drill:drillbit=localhost select . . . . . . . . . . . . . . . . sum(a1) over(partition by b1 order by c1) as sum1, . . . . . . . . . . . . . . . . count(*) over(partition by b1 order by c1) as count1 . . . . . . . . . . . . . . . . from . . . . . . . . . . . . . . . . t1; +---+-+ | sum1 | count1 | +---+-+ | 1 | 1 | | 2 | 1 | | 3 | 1 | | 5 | 1 | | 6 | 1 | | 7 | 1 | | null | 1 | | 9 | 1 | | 10| 1 | | 4 | 1 | +---+-+ 10 rows selected (0.096 seconds) {code} -- This
[jira] [Commented] (DRILL-3412) Projections are not getting push down below Window operator
[ https://issues.apache.org/jira/browse/DRILL-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643600#comment-14643600 ] Deneche A. Hakim commented on DRILL-3412: - I tried multiple commits, the commit right before 3304 didn't have this issue but 3304 did. I may be mistaken as I tried lot's of commits building them each time. Projections are not getting push down below Window operator --- Key: DRILL-3412 URL: https://issues.apache.org/jira/browse/DRILL-3412 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Aman Sinha Assignee: Jinfeng Ni Priority: Blocker Labels: window_function Fix For: 1.2.0 The plan below shows that the 'star' column is being produced by the Scan and subsequent Project. This indicates projection pushdown is not working as desired when window function is present. The query produces correct results. {code} explain plan for select min(n_nationkey) over (partition by n_regionkey) from cp.`tpch/nation.parquet` ; 00-00Screen 00-01 Project(EXPR$0=[$0]) 00-02Project(w0$o0=[$3]) 00-03 Window(window#0=[window(partition {2} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [MIN($1)])]) 00-04SelectionVectorRemover 00-05 Sort(sort0=[$2], dir0=[ASC]) 00-06Project(T1¦¦*=[$0], n_nationkey=[$1], n_regionkey=[$2]) 00-07 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`*`]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3555) Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail
[ https://issues.apache.org/jira/browse/DRILL-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643583#comment-14643583 ] Abhishek Girish commented on DRILL-3555: I could say yes. Some queries which did run successfully with the default failed to run after increasing the max query memory per node. For a few others which failed with OOM before, failed with the error described in the issue instead. Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail -- Key: DRILL-3555 URL: https://issues.apache.org/jira/browse/DRILL-3555 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.1.0, 1.2.0 Environment: 4 Nodes. Direct Memory= 48 GB each Reporter: Abhishek Girish Assignee: Steven Phillips Priority: Critical Changing the default value for planner.memory.max_query_memory_per_node from 2 GB to anything higher causes queries with window functions to fail. Changed system options {code:sql} select * from sys.options where status like '%CHANGE%'; +---+--+-+--+-+-+---++ | name| kind | type | status | num_val | string_val | bool_val | float_val | +---+--+-+--+-+-+---++ | planner.enable_decimal_data_type | BOOLEAN | SYSTEM | CHANGED | null| null| true | null | | planner.memory.max_query_memory_per_node | LONG | SYSTEM | CHANGED | 8589934592 | null| null | null | +---+--+-+--+-+-+---++ 2 rows selected (0.249 seconds) {code} Query {code:sql} SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) FROM store_sales ss LIMIT 20; java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) at sqlline.SqlLine.print(SqlLine.java:1583) at sqlline.Commands.execute(Commands.java:852) at sqlline.Commands.sql(Commands.java:751) at sqlline.SqlLine.dispatch(SqlLine.java:738) at sqlline.SqlLine.begin(SqlLine.java:612) at sqlline.SqlLine.start(SqlLine.java:366) at sqlline.SqlLine.main(SqlLine.java:259) {code} Log: {code} 2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: State change requested RUNNING -- FINISHED 2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO o.a.d.e.w.f.FragmentStatusReporter - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: State to report: FINISHED 2015-07-23 18:17:05,485 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR o.a.d.e.p.i.s.SortRecordBatchBuilder - Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 2015-07-23 18:17:05,486 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: State change requested RUNNING -- FAILED ... 2015-07-23 18:17:05,990 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: State change requested FAILED -- FINISHED 2015-07-23 18:17:05,999 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on
[jira] [Commented] (DRILL-3412) Projections are not getting push down below Window operator
[ https://issues.apache.org/jira/browse/DRILL-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643589#comment-14643589 ] Jinfeng Ni commented on DRILL-3412: --- Why does DRILL-3304 have anything to with project pushdown? Seems to me DRILL-3412 and DRILL-3304 are dealing with two different issues. Projections are not getting push down below Window operator --- Key: DRILL-3412 URL: https://issues.apache.org/jira/browse/DRILL-3412 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Aman Sinha Assignee: Jinfeng Ni Priority: Blocker Labels: window_function Fix For: 1.2.0 The plan below shows that the 'star' column is being produced by the Scan and subsequent Project. This indicates projection pushdown is not working as desired when window function is present. The query produces correct results. {code} explain plan for select min(n_nationkey) over (partition by n_regionkey) from cp.`tpch/nation.parquet` ; 00-00Screen 00-01 Project(EXPR$0=[$0]) 00-02Project(w0$o0=[$3]) 00-03 Window(window#0=[window(partition {2} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [MIN($1)])]) 00-04SelectionVectorRemover 00-05 Sort(sort0=[$2], dir0=[ASC]) 00-06Project(T1¦¦*=[$0], n_nationkey=[$1], n_regionkey=[$2]) 00-07 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`*`]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3555) Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail
[ https://issues.apache.org/jira/browse/DRILL-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643581#comment-14643581 ] Abhishek Girish commented on DRILL-3555: As discussed offline, I'll skip running this query. In summary, this issue might have regressed. But we never had thorough tests for them, so its also likely to be an existing issue which was only observed recently. Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail -- Key: DRILL-3555 URL: https://issues.apache.org/jira/browse/DRILL-3555 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.1.0, 1.2.0 Environment: 4 Nodes. Direct Memory= 48 GB each Reporter: Abhishek Girish Assignee: Jinfeng Ni Priority: Critical Changing the default value for planner.memory.max_query_memory_per_node from 2 GB to anything higher causes queries with window functions to fail. Changed system options {code:sql} select * from sys.options where status like '%CHANGE%'; +---+--+-+--+-+-+---++ | name| kind | type | status | num_val | string_val | bool_val | float_val | +---+--+-+--+-+-+---++ | planner.enable_decimal_data_type | BOOLEAN | SYSTEM | CHANGED | null| null| true | null | | planner.memory.max_query_memory_per_node | LONG | SYSTEM | CHANGED | 8589934592 | null| null | null | +---+--+-+--+-+-+---++ 2 rows selected (0.249 seconds) {code} Query {code:sql} SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) FROM store_sales ss LIMIT 20; java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) at sqlline.SqlLine.print(SqlLine.java:1583) at sqlline.Commands.execute(Commands.java:852) at sqlline.Commands.sql(Commands.java:751) at sqlline.SqlLine.dispatch(SqlLine.java:738) at sqlline.SqlLine.begin(SqlLine.java:612) at sqlline.SqlLine.start(SqlLine.java:366) at sqlline.SqlLine.main(SqlLine.java:259) {code} Log: {code} 2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: State change requested RUNNING -- FINISHED 2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO o.a.d.e.w.f.FragmentStatusReporter - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: State to report: FINISHED 2015-07-23 18:17:05,485 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR o.a.d.e.p.i.s.SortRecordBatchBuilder - Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 2015-07-23 18:17:05,486 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: State change requested RUNNING -- FAILED ... 2015-07-23 18:17:05,990 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: State change requested FAILED -- FINISHED 2015-07-23 18:17:05,999 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] at
[jira] [Commented] (DRILL-3555) Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail
[ https://issues.apache.org/jira/browse/DRILL-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643582#comment-14643582 ] Abhishek Girish commented on DRILL-3555: Steven, could you please take a look at this? Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail -- Key: DRILL-3555 URL: https://issues.apache.org/jira/browse/DRILL-3555 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.1.0, 1.2.0 Environment: 4 Nodes. Direct Memory= 48 GB each Reporter: Abhishek Girish Assignee: Steven Phillips Priority: Critical Changing the default value for planner.memory.max_query_memory_per_node from 2 GB to anything higher causes queries with window functions to fail. Changed system options {code:sql} select * from sys.options where status like '%CHANGE%'; +---+--+-+--+-+-+---++ | name| kind | type | status | num_val | string_val | bool_val | float_val | +---+--+-+--+-+-+---++ | planner.enable_decimal_data_type | BOOLEAN | SYSTEM | CHANGED | null| null| true | null | | planner.memory.max_query_memory_per_node | LONG | SYSTEM | CHANGED | 8589934592 | null| null | null | +---+--+-+--+-+-+---++ 2 rows selected (0.249 seconds) {code} Query {code:sql} SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) FROM store_sales ss LIMIT 20; java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) at sqlline.SqlLine.print(SqlLine.java:1583) at sqlline.Commands.execute(Commands.java:852) at sqlline.Commands.sql(Commands.java:751) at sqlline.SqlLine.dispatch(SqlLine.java:738) at sqlline.SqlLine.begin(SqlLine.java:612) at sqlline.SqlLine.start(SqlLine.java:366) at sqlline.SqlLine.main(SqlLine.java:259) {code} Log: {code} 2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: State change requested RUNNING -- FINISHED 2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO o.a.d.e.w.f.FragmentStatusReporter - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: State to report: FINISHED 2015-07-23 18:17:05,485 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR o.a.d.e.p.i.s.SortRecordBatchBuilder - Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 2015-07-23 18:17:05,486 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: State change requested RUNNING -- FAILED ... 2015-07-23 18:17:05,990 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: State change requested FAILED -- FINISHED 2015-07-23 18:17:05,999 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523) ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at
[jira] [Updated] (DRILL-3555) Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail
[ https://issues.apache.org/jira/browse/DRILL-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-3555: --- Assignee: Steven Phillips (was: Jinfeng Ni) Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail -- Key: DRILL-3555 URL: https://issues.apache.org/jira/browse/DRILL-3555 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.1.0, 1.2.0 Environment: 4 Nodes. Direct Memory= 48 GB each Reporter: Abhishek Girish Assignee: Steven Phillips Priority: Critical Changing the default value for planner.memory.max_query_memory_per_node from 2 GB to anything higher causes queries with window functions to fail. Changed system options {code:sql} select * from sys.options where status like '%CHANGE%'; +---+--+-+--+-+-+---++ | name| kind | type | status | num_val | string_val | bool_val | float_val | +---+--+-+--+-+-+---++ | planner.enable_decimal_data_type | BOOLEAN | SYSTEM | CHANGED | null| null| true | null | | planner.memory.max_query_memory_per_node | LONG | SYSTEM | CHANGED | 8589934592 | null| null | null | +---+--+-+--+-+-+---++ 2 rows selected (0.249 seconds) {code} Query {code:sql} SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) FROM store_sales ss LIMIT 20; java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) at sqlline.SqlLine.print(SqlLine.java:1583) at sqlline.Commands.execute(Commands.java:852) at sqlline.Commands.sql(Commands.java:751) at sqlline.SqlLine.dispatch(SqlLine.java:738) at sqlline.SqlLine.begin(SqlLine.java:612) at sqlline.SqlLine.start(SqlLine.java:366) at sqlline.SqlLine.main(SqlLine.java:259) {code} Log: {code} 2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: State change requested RUNNING -- FINISHED 2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO o.a.d.e.w.f.FragmentStatusReporter - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: State to report: FINISHED 2015-07-23 18:17:05,485 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR o.a.d.e.p.i.s.SortRecordBatchBuilder - Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 2015-07-23 18:17:05,486 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: State change requested RUNNING -- FAILED ... 2015-07-23 18:17:05,990 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: State change requested FAILED -- FINISHED 2015-07-23 18:17:05,999 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523) ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at
[jira] [Commented] (DRILL-3412) Projections are not getting push down below Window operator
[ https://issues.apache.org/jira/browse/DRILL-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643579#comment-14643579 ] Deneche A. Hakim commented on DRILL-3412: - After some testing it looks like this bug was introduced by DRILL-3304, at least for queries that didn't involve a RANK window function Projections are not getting push down below Window operator --- Key: DRILL-3412 URL: https://issues.apache.org/jira/browse/DRILL-3412 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Aman Sinha Assignee: Jinfeng Ni Priority: Blocker Labels: window_function Fix For: 1.2.0 The plan below shows that the 'star' column is being produced by the Scan and subsequent Project. This indicates projection pushdown is not working as desired when window function is present. The query produces correct results. {code} explain plan for select min(n_nationkey) over (partition by n_regionkey) from cp.`tpch/nation.parquet` ; 00-00Screen 00-01 Project(EXPR$0=[$0]) 00-02Project(w0$o0=[$3]) 00-03 Window(window#0=[window(partition {2} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [MIN($1)])]) 00-04SelectionVectorRemover 00-05 Sort(sort0=[$2], dir0=[ASC]) 00-06Project(T1¦¦*=[$0], n_nationkey=[$1], n_regionkey=[$2]) 00-07 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`*`]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3568) Exception when NOT condition is used with the column name
Victoria Markman created DRILL-3568: --- Summary: Exception when NOT condition is used with the column name Key: DRILL-3568 URL: https://issues.apache.org/jira/browse/DRILL-3568 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.1.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Priority: Minor Exception: {code} 0: jdbc:drill:drillbit=localhost select a1 from t1 where not a1; Error: SYSTEM ERROR: ClassCastException: org.apache.calcite.rex.RexInputRef cannot be cast to org.apache.calcite.rex.RexCall [Error Id: bdf6251a-0649-4d0c-8fd2-466058aebd3b on 172.16.1.129:31010] (state=,code=0) {code} drillbit.log {code} Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: Internal error: Error while applying rule DrillReduceExpressionsRule(Filter), args [rel#254574:LogicalFilter.NONE.ANY([]).[](input=rel#254573:Subset#3.NONE.ANY([]).[],condition=NOT($0))] ... 4 common frames omitted Caused by: java.lang.AssertionError: Internal error: Error while applying rule DrillReduceExpressionsRule(Filter), args [rel#254574:LogicalFilter.NONE.ANY([]).[](input=rel#254573:Subset#3.NONE.ANY([]).[],condition=NOT($0))] at org.apache.calcite.util.Util.newInternal(Util.java:790) ~[calcite-core-1.1.0-drill-r14.jar:1.1.0-drill-r14] at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:251) ~[calcite-core-1.1.0-drill-r14.jar:1.1.0-drill-r14] at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:795) ~[calcite-core-1.1.0-drill-r14.jar:1.1.0-drill-r14] at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303) ~[calcite-core-1.1.0-drill-r14.jar:1.1.0-drill-r14] at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:316) ~[calcite-core-1.1.0-drill-r14.jar:1.1.0-drill-r14] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.logicalPlanningVolcanoAndLopt(DefaultSqlHandler.java:528) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:213) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:248) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:178) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:903) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:242) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] ... 3 common frames omitted Caused by: java.lang.ClassCastException: org.apache.calcite.rex.RexInputRef cannot be cast to org.apache.calcite.rex.RexCall at org.apache.calcite.rel.rules.ReduceExpressionsRule$ReduceFilterRule.onMatch(ReduceExpressionsRule.java:160) ~[calcite-core-1.1.0-drill-r14.jar:1.1.0-drill-r14] at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228) ~[calcite-core-1.1.0-drill-r14.jar:1.1.0-drill-r14] ... 13 common frames omitted 2015-07-27 17:27:35,634 [2a493387-a322-461b-730c-2d475911e25e:foreman] INFO o.a.drill.exec.work.foreman.Foreman - State change requested. PENDING -- FAILED org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: Internal error: Error while applying rule DrillReduceExpressionsRule(Filter), args [rel#254602:LogicalFilter.NONE.ANY([]).[](input=rel#254601:Subset#3.NONE.ANY([]).[],condition=NOT($0))] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:253) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] Caused by: java.lang.AssertionError: Internal error: Error while applying rule DrillReduceExpressionsRule(Filter), args [rel#254602:LogicalFilter.NONE.ANY([]).[](input=rel#254601:Subset#3.NONE.ANY([]).[],condition=NOT($0))] at org.apache.calcite.util.Util.newInternal(Util.java:790) ~[calcite-core-1.1.0-drill-r14.jar:1.1.0-drill-r14] at
[jira] [Commented] (DRILL-3458) Avro file format's support for map and nullable union data types.
[ https://issues.apache.org/jira/browse/DRILL-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643744#comment-14643744 ] Bhallamudi Venkata Siva Kamesh commented on DRILL-3458: --- Addressed review comments and updated patch in the review board. Avro file format's support for map and nullable union data types. - Key: DRILL-3458 URL: https://issues.apache.org/jira/browse/DRILL-3458 Project: Apache Drill Issue Type: Bug Components: Storage - Other Affects Versions: 1.0.0 Reporter: Bhallamudi Venkata Siva Kamesh Assignee: Jacques Nadeau Avro file format as of now does not support union and map datatypes. For union datatypes, like [Pig|https://cwiki.apache.org/confluence/display/PIG/AvroStorage] and [Hive|https://cwiki.apache.org/confluence/display/Hive/AvroSerDe], I think, we can support nullable union like [null, some-type]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3364) Prune scan range if the filter is on the leading field with byte comparable encoding
[ https://issues.apache.org/jira/browse/DRILL-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643648#comment-14643648 ] Aditya Kishore commented on DRILL-3364: --- +1. LGTM. Prune scan range if the filter is on the leading field with byte comparable encoding Key: DRILL-3364 URL: https://issues.apache.org/jira/browse/DRILL-3364 Project: Apache Drill Issue Type: Sub-task Components: Storage - HBase Reporter: Aditya Kishore Assignee: Smidth Panchamia Fix For: 1.2.0 Attachments: 0001-Add-convert_from-and-convert_to-methods-for-TIMESTAM.patch, 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch, 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch, 0001-DRILL-3364-Prune-scan-range-if-the-filter-is-on-the-.patch, 0001-PATCH-DRILL-3364-Prune-scan-range-if-the-filter-is-o.patch, composite.jun26.diff -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-3518) Do a better job of providing conceptual overview to UDF creation
[ https://issues.apache.org/jira/browse/DRILL-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kristine Hahn reassigned DRILL-3518: Assignee: Kristine Hahn (was: Bridget Bevens) Do a better job of providing conceptual overview to UDF creation Key: DRILL-3518 URL: https://issues.apache.org/jira/browse/DRILL-3518 Project: Apache Drill Issue Type: Sub-task Components: Documentation Reporter: Jacques Nadeau Assignee: Kristine Hahn Since UDFs are effectively written in Java, people find it confusing when some Java features aren't supported. Let's try to do a better job of outlining the pitfalls. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3492) Add support for encoding of Drill data types into byte ordered format
[ https://issues.apache.org/jira/browse/DRILL-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Smidth Panchamia updated DRILL-3492: Attachment: 0001-DRILL-3492-Add-support-for-encoding-decoding-of-to-f.patch Combined patch to support OrderedBytes encoding of double, long, bigint and int type data. Add support for encoding of Drill data types into byte ordered format - Key: DRILL-3492 URL: https://issues.apache.org/jira/browse/DRILL-3492 Project: Apache Drill Issue Type: New Feature Reporter: Smidth Panchamia Assignee: Smidth Panchamia Attachments: 0001-DRILL-3492-Add-support-for-encoding-decoding-of-to-f.patch The following JIRA added this functionality in HBase: https://issues.apache.org/jira/browse/HBASE-8201 We need to port this functionality in Drill so as to allow filtering and pruning of rows during scans. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array
Philip Deegan created DRILL-3562: Summary: Query fails when using flatten on JSON data where some documents have an empty array Key: DRILL-3562 URL: https://issues.apache.org/jira/browse/DRILL-3562 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 1.1.0 Reporter: Philip Deegan Assignee: Steven Phillips Drill query fails when using flatten when some records contain an empty array SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) flat WHERE flat.c.d.e = 'f' limit 1; Succeeds on { a: { b: { c: [ { d: { e: f } } ] } } } Fails on { a: { b: { c: [] } } } Error {noformat} Error: SYSTEM ERROR: ClassCastException: Cannot cast org.apache.drill.exec.vector.NullableIntVector to org.apache.drill.exec.vector.complex.RepeatedValueVector {noformat} Is it possible to ignore the empty arrays, or do they need to be populated with dummy data? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3558) Functions are not being called if the use dirN functions that hva not beeni initialized
Stefán Baxter created DRILL-3558: Summary: Functions are not being called if the use dirN functions that hva not beeni initialized Key: DRILL-3558 URL: https://issues.apache.org/jira/browse/DRILL-3558 Project: Apache Drill Issue Type: Bug Components: SQL Parser Affects Versions: 1.1.0 Reporter: Stefán Baxter Assignee: Aman Sinha If a function takes dir1 as a parameter (example: fn(dir0, dir1)) and there is no second level dir being traversed then the function is not called. See the following from the usergroup: === I'm still working on our evaluation and now focusing on directory based queries and mixing directory and parquet based partitions. This is also a continued trip down the UDF rabbit hole :) (some pun intended) I continue to come across things that surprise me and I would like to share them with both the developers, that might want to address some of them, and other newcomers that might benefit from them. The UDF code referenced here van be found at (https://github.com/acmeguy/asdrill) and the documents and the directory structure used in these examples are included in the tiny attachment. I will try to keep this as brief as possible. What you need to know is: there are 33 records in the 19 files in a mixed directory structure - see zip for details each record contains a date that is valid within that directory structure the dirInRange function is a UDF that takes a date range and directory information to determine if a directory contains target data for the provided date range - see github for details the dirInRange function should be able to accept all null values or missing parameters for everything other than the time range starts parameter the dirInRange function returns a string with a number that represents the number of parameters used (function variant) - has no other purpose/function at this point - will return the value of the last dirN paramater that is not null (dir0, dir1 or dir2) Observations 1. The UDF function (dirInRange) is not called if dir0, dir1 or dir2 are missing (missing is not the same as null here) select occurred_at, dir0 dYear, dir1 dMonth, dir2 dDay from dfs.tmp.`/analytics/processed/test/events` as t order by occurred_at; - return 33 records select occurred_at, dir0 dYear, dir1 dMonth, dir2 dDay from dfs.tmp.`/analytics/processed/test/events` as t where dirInRange(cast('2015-04-10' as timestamp),cast('2015-07-11' as timestamp),COALESCE(dir0,'-'),COALESCE(dir1,'-'),COALESCE(dir2,'-')) '0' order by occurred_at; - returns 33 record (Coalesce handles the missing values are replaces them with '-') select occurred_at, dir0 dYear, dir1 dMonth, dir2 dDay from dfs.tmp.`/analytics/processed/test/events` as t where dirInRange(cast('2015-04-10' as timestamp),cast('2015-07-11' as timestamp),dir0,dir1,dir2) '0' order by occurred_at; - returns 13 records (only those in the deepest directories where dir0, dir1, dir2 are all set select occurred_at, dir0 dYear, dir1 dMonth, dir2 dDay, dirInRange(cast('2015-04-10' as timestamp),cast('2015-07-11' as timestamp),dir0,dir1,dir2) inRange from dfs.tmp.`/analytics/processed/test/events` as t order by occurred_at; - return 33 records but 20 of the records will have inRange set to null (the UDF never returns null so it's being ignored completely) Lesson: It's not enough to use Nullable*Holder in UDF and have all permutations covered - Drill will not call the function and fails silently, evaluating the outcome of the function to null, if any of the dirN parameters are not initialized 2. System.out.print out is the way to get information from within the Drillbit It would be good to know which Drillbit instance, if many, is responsible for the println - I don't know how to get the parent drillbit injected into the UDF 3. If directories have numeric names then Drill starts to insist they are all numeric (in the where condition) event though dirInRange always returns a varchar. select occurred_at, dir0 dYear, dir1 dMonth, dir2 dDay from dfs.tmp.`/analytics/processed/test/events` as t where dir0 = dirInRange(cast('2015-04-10' as timestamp),cast('2015-07-11' as timestamp),COALESCE(dir0,'-'),COALESCE(dir1,'-'),COALESCE(dir2,'-')) order by occurred_at; 2011,2012 is the name of the directory (same happens with directories (Q1 and W1 etc.) java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: NumberFormatException: 2011,2012 Fragment 0:0 [Error Id: 0c3e1370-ccc5-4288-b6c9-ea0ef4884f1e on localhost:31010] This seems to fail on the other side where Drill thinks that the outcome of the dirInRange function is numeric and that the = expression is a numerical one. this runs though: select occurred_at, dir0 dYear, dir1 dMonth, dir2 dDay from dfs.tmp.`/analytics/processed/test/events` as t where
[jira] [Created] (DRILL-3559) Make filename available to sql statments just like dirN
Stefán Baxter created DRILL-3559: Summary: Make filename available to sql statments just like dirN Key: DRILL-3559 URL: https://issues.apache.org/jira/browse/DRILL-3559 Project: Apache Drill Issue Type: Improvement Components: SQL Parser Affects Versions: 1.1.0 Reporter: Stefán Baxter Assignee: Aman Sinha Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3560) Make partition pruning work for directory queries
Stefán Baxter created DRILL-3560: Summary: Make partition pruning work for directory queries Key: DRILL-3560 URL: https://issues.apache.org/jira/browse/DRILL-3560 Project: Apache Drill Issue Type: New Feature Components: Query Planning Optimization Affects Versions: 1.1.0 Reporter: Stefán Baxter Assignee: Jinfeng Ni Currently queries that include directory conditions are not optimized at all and the directory expression (dir0 = 'something') is evaluated for every record of every file for every directory. This could be optimized to fail directories and allow for the same kind of partition pruning for directories as for other scenarios where data has been partitioned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3561) The return type of UDF seems ambigious and varchar results can force arithmetic comparison
Stefán Baxter created DRILL-3561: Summary: The return type of UDF seems ambigious and varchar results can force arithmetic comparison Key: DRILL-3561 URL: https://issues.apache.org/jira/browse/DRILL-3561 Project: Apache Drill Issue Type: Bug Components: SQL Parser Reporter: Stefán Baxter Assignee: Aman Sinha Please see information in the following user group email where dir0, containing 2011-2012 is being compared to a varchar/null result of a UDF and Drill tries to convert dir0 to a number and fails. == I'm still working on our evaluation and now focusing on directory based queries and mixing directory and parquet based partitions. This is also a continued trip down the UDF rabbit hole :) (some pun intended) I continue to come across things that surprise me and I would like to share them with both the developers, that might want to address some of them, and other newcomers that might benefit from them. The UDF code referenced here van be found at (https://github.com/acmeguy/asdrill) and the documents and the directory structure used in these examples are included in the tiny attachment. I will try to keep this as brief as possible. What you need to know is: there are 33 records in the 19 files in a mixed directory structure - see zip for details each record contains a date that is valid within that directory structure the dirInRange function is a UDF that takes a date range and directory information to determine if a directory contains target data for the provided date range - see github for details the dirInRange function should be able to accept all null values or missing parameters for everything other than the time range starts parameter the dirInRange function returns a string with a number that represents the number of parameters used (function variant) - has no other purpose/function at this point - will return the value of the last dirN paramater that is not null (dir0, dir1 or dir2) Observations 1. The UDF function (dirInRange) is not called if dir0, dir1 or dir2 are missing (missing is not the same as null here) select occurred_at, dir0 dYear, dir1 dMonth, dir2 dDay from dfs.tmp.`/analytics/processed/test/events` as t order by occurred_at; - return 33 records select occurred_at, dir0 dYear, dir1 dMonth, dir2 dDay from dfs.tmp.`/analytics/processed/test/events` as t where dirInRange(cast('2015-04-10' as timestamp),cast('2015-07-11' as timestamp),COALESCE(dir0,'-'),COALESCE(dir1,'-'),COALESCE(dir2,'-')) '0' order by occurred_at; - returns 33 record (Coalesce handles the missing values are replaces them with '-') select occurred_at, dir0 dYear, dir1 dMonth, dir2 dDay from dfs.tmp.`/analytics/processed/test/events` as t where dirInRange(cast('2015-04-10' as timestamp),cast('2015-07-11' as timestamp),dir0,dir1,dir2) '0' order by occurred_at; - returns 13 records (only those in the deepest directories where dir0, dir1, dir2 are all set select occurred_at, dir0 dYear, dir1 dMonth, dir2 dDay, dirInRange(cast('2015-04-10' as timestamp),cast('2015-07-11' as timestamp),dir0,dir1,dir2) inRange from dfs.tmp.`/analytics/processed/test/events` as t order by occurred_at; - return 33 records but 20 of the records will have inRange set to null (the UDF never returns null so it's being ignored completely) Lesson: It's not enough to use Nullable*Holder in UDF and have all permutations covered - Drill will not call the function and fails silently, evaluating the outcome of the function to null, if any of the dirN parameters are not initialized 2. System.out.print out is the way to get information from within the Drillbit It would be good to know which Drillbit instance, if many, is responsible for the println - I don't know how to get the parent drillbit injected into the UDF 3. If directories have numeric names then Drill starts to insist they are all numeric (in the where condition) event though dirInRange always returns a varchar. select occurred_at, dir0 dYear, dir1 dMonth, dir2 dDay from dfs.tmp.`/analytics/processed/test/events` as t where dir0 = dirInRange(cast('2015-04-10' as timestamp),cast('2015-07-11' as timestamp),COALESCE(dir0,'-'),COALESCE(dir1,'-'),COALESCE(dir2,'-')) order by occurred_at; 2011,2012 is the name of the directory (same happens with directories (Q1 and W1 etc.) java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: NumberFormatException: 2011,2012 Fragment 0:0 [Error Id: 0c3e1370-ccc5-4288-b6c9-ea0ef4884f1e on localhost:31010] This seems to fail on the other side where Drill thinks that the outcome of the dirInRange function is numeric and that the = expression is a numerical one. this runs though: select occurred_at, dir0 dYear, dir1 dMonth, dir2 dDay from dfs.tmp.`/analytics/processed/test/events` as t where dir0 =
[jira] [Comment Edited] (DRILL-3555) Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail
[ https://issues.apache.org/jira/browse/DRILL-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642375#comment-14642375 ] Abhishek Girish edited comment on DRILL-3555 at 7/27/15 7:09 AM: - TPC-DS SF100 - Parquet. was (Author: agirish): Parquet. Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail -- Key: DRILL-3555 URL: https://issues.apache.org/jira/browse/DRILL-3555 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.1.0, 1.2.0 Environment: 4 Nodes. Direct Memory= 48 GB each Reporter: Abhishek Girish Assignee: Jinfeng Ni Priority: Critical Changing the default value for planner.memory.max_query_memory_per_node from 2 GB to anything higher causes queries with window functions to fail. Changed system options {code:sql} select * from sys.options where status like '%CHANGE%'; +---+--+-+--+-+-+---++ | name| kind | type | status | num_val | string_val | bool_val | float_val | +---+--+-+--+-+-+---++ | planner.enable_decimal_data_type | BOOLEAN | SYSTEM | CHANGED | null| null| true | null | | planner.memory.max_query_memory_per_node | LONG | SYSTEM | CHANGED | 8589934592 | null| null | null | +---+--+-+--+-+-+---++ 2 rows selected (0.249 seconds) {code} Query {code:sql} SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) FROM store_sales ss LIMIT 20; java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) at sqlline.SqlLine.print(SqlLine.java:1583) at sqlline.Commands.execute(Commands.java:852) at sqlline.Commands.sql(Commands.java:751) at sqlline.SqlLine.dispatch(SqlLine.java:738) at sqlline.SqlLine.begin(SqlLine.java:612) at sqlline.SqlLine.start(SqlLine.java:366) at sqlline.SqlLine.main(SqlLine.java:259) {code} Log: {code} 2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: State change requested RUNNING -- FINISHED 2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO o.a.d.e.w.f.FragmentStatusReporter - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: State to report: FINISHED 2015-07-23 18:17:05,485 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR o.a.d.e.p.i.s.SortRecordBatchBuilder - Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 2015-07-23 18:17:05,486 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: State change requested RUNNING -- FAILED ... 2015-07-23 18:17:05,990 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: State change requested FAILED -- FINISHED 2015-07-23 18:17:05,999 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523)
[jira] [Commented] (DRILL-3555) Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail
[ https://issues.apache.org/jira/browse/DRILL-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642375#comment-14642375 ] Abhishek Girish commented on DRILL-3555: Parquet. Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail -- Key: DRILL-3555 URL: https://issues.apache.org/jira/browse/DRILL-3555 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.1.0, 1.2.0 Environment: 4 Nodes. Direct Memory= 48 GB each Reporter: Abhishek Girish Assignee: Jinfeng Ni Priority: Critical Changing the default value for planner.memory.max_query_memory_per_node from 2 GB to anything higher causes queries with window functions to fail. Changed system options {code:sql} select * from sys.options where status like '%CHANGE%'; +---+--+-+--+-+-+---++ | name| kind | type | status | num_val | string_val | bool_val | float_val | +---+--+-+--+-+-+---++ | planner.enable_decimal_data_type | BOOLEAN | SYSTEM | CHANGED | null| null| true | null | | planner.memory.max_query_memory_per_node | LONG | SYSTEM | CHANGED | 8589934592 | null| null | null | +---+--+-+--+-+-+---++ 2 rows selected (0.249 seconds) {code} Query {code:sql} SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) FROM store_sales ss LIMIT 20; java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) at sqlline.SqlLine.print(SqlLine.java:1583) at sqlline.Commands.execute(Commands.java:852) at sqlline.Commands.sql(Commands.java:751) at sqlline.SqlLine.dispatch(SqlLine.java:738) at sqlline.SqlLine.begin(SqlLine.java:612) at sqlline.SqlLine.start(SqlLine.java:366) at sqlline.SqlLine.main(SqlLine.java:259) {code} Log: {code} 2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: State change requested RUNNING -- FINISHED 2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO o.a.d.e.w.f.FragmentStatusReporter - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: State to report: FINISHED 2015-07-23 18:17:05,485 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR o.a.d.e.p.i.s.SortRecordBatchBuilder - Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 2015-07-23 18:17:05,486 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: State change requested RUNNING -- FAILED ... 2015-07-23 18:17:05,990 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: State change requested FAILED -- FINISHED 2015-07-23 18:17:05,999 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: DrillRuntimeException: Adding this batch causes the total size to exceed max allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824 Fragment 1:0 [Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523) ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at