[jira] [Created] (DRILL-3166) Cost model of funtions does not account field reader arguments
Hanifi Gunes created DRILL-3166: --- Summary: Cost model of funtions does not account field reader arguments Key: DRILL-3166 URL: https://issues.apache.org/jira/browse/DRILL-3166 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.0.0 Reporter: Hanifi Gunes Assignee: Hanifi Gunes Drill relies on a cost model to figure out the best function match during execution. Current cost model does not count field reader arguments in the final computed cost. We need to update the cost model so as to account for reader arguments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3147) tpcds-sf1-parquet query 73 causes memory leak
[ https://issues.apache.org/jira/browse/DRILL-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim updated DRILL-3147: Assignee: Jacques Nadeau (was: Deneche A. Hakim) tpcds-sf1-parquet query 73 causes memory leak - Key: DRILL-3147 URL: https://issues.apache.org/jira/browse/DRILL-3147 Project: Apache Drill Issue Type: Bug Reporter: Deneche A. Hakim Assignee: Jacques Nadeau Fix For: 1.1.0 the leak seems to appear when BaseRawBatchBuffer.enqueue() tries to release a batch but the allocator has already been closed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3147) tpcds-sf1-parquet query 73 causes memory leak
[ https://issues.apache.org/jira/browse/DRILL-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554202#comment-14554202 ] Deneche A. Hakim commented on DRILL-3147: - review request [#34541|https://reviews.apache.org/r/34541/] tpcds-sf1-parquet query 73 causes memory leak - Key: DRILL-3147 URL: https://issues.apache.org/jira/browse/DRILL-3147 Project: Apache Drill Issue Type: Bug Reporter: Deneche A. Hakim Assignee: Jacques Nadeau Fix For: 1.1.0 the leak seems to appear when BaseRawBatchBuffer.enqueue() tries to release a batch but the allocator has already been closed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3147) tpcds-sf1-parquet query 73 causes memory leak
[ https://issues.apache.org/jira/browse/DRILL-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim updated DRILL-3147: Attachment: DRILL-3147.1.patch.txt - FragmentContext.close() waits 100ms before closing the allocator to give enough time to the rpc layer to properly release any batch that was just transfered to this fragment's allocator - each time a fragment A sends a receiver finished to fragment B, fragment B id will be added to FragmentContext.ignoredSenders list - refactored UnorderedReceiverBatch.informSenders() and MergingRecordBatch.informSenders() by moving this method to FragmentContext - DataServer.send() uses FragmentContext.ignoredSenders to decide if a batch should be passed to the fragment or discarded right away - BaseRawBatchBuffer methods enqueue() and kill() are now synchronized - TestTpcdsSf1Leak test reproduces the leak, it's ignored by default because it requires a large dataset tpcds-sf1-parquet query 73 causes memory leak - Key: DRILL-3147 URL: https://issues.apache.org/jira/browse/DRILL-3147 Project: Apache Drill Issue Type: Bug Reporter: Deneche A. Hakim Assignee: Jacques Nadeau Fix For: 1.1.0 Attachments: DRILL-3147.1.patch.txt the leak seems to appear when BaseRawBatchBuffer.enqueue() tries to release a batch but the allocator has already been closed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-3137) Joining tables with lots of duplicates and LIMIT does not do early termination
[ https://issues.apache.org/jira/browse/DRILL-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha reassigned DRILL-3137: - Assignee: Aman Sinha (was: Chris Westin) Joining tables with lots of duplicates and LIMIT does not do early termination -- Key: DRILL-3137 URL: https://issues.apache.org/jira/browse/DRILL-3137 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.9.0 Reporter: Aman Sinha Assignee: Aman Sinha Create a table with duplicate keys: {code} create table dfs.tmp.lineitem_dup as select 100 as key1, 200 as key2 from cp.`tpch/lineitem.parquet`; +---++ | Fragment | Number of records written | +---++ | 0_0 | 60175 | +---++ {code} Now do a self-join with a LIMIT. This query should do an early termination because of the LIMIT but it runs for about 2 minutes: (note in this plan there are no exchanges): {code} select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1; 0: jdbc:drill:zk=local select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1; +---+---+++ | key1 | key2 | key10 | key20 | +---+---+++ | 100 | 200 | 100| 200| +---+---+++ 1 row selected (111.764 seconds) {code} Disabling hash join does not help in this case. 0: jdbc:drill:zk=local alter session set `planner.enable_hashjoin` = false; +---+---+ | ok | summary | +---+---+ | true | planner.enable_hashjoin updated. | +---+---+ 1 row selected (0.094 seconds) 0: jdbc:drill:zk=local select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1; +---+---+++ | key1 | key2 | key10 | key20 | +---+---+++ | 100 | 200 | 100| 200| +---+---+++ 1 row selected (198.874 seconds) {code} However, forcing exchanges in the plan helps and the query terminates early: {code} 0: jdbc:drill:zk=local alter session set `planner.slice_target` = 1; 0: jdbc:drill:zk=local select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1; +---+---+++ | key1 | key2 | key10 | key20 | +---+---+++ | 100 | 200 | 100| 200| +---+---+++ 1 row selected (0.765 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3166) Cost model of funtions does not account field reader arguments
[ https://issues.apache.org/jira/browse/DRILL-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanifi Gunes updated DRILL-3166: Assignee: Mehant Baid (was: Hanifi Gunes) Cost model of funtions does not account field reader arguments -- Key: DRILL-3166 URL: https://issues.apache.org/jira/browse/DRILL-3166 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.0.0 Reporter: Hanifi Gunes Assignee: Mehant Baid Drill relies on a cost model to figure out the best function match during execution. Current cost model does not count field reader arguments in the final computed cost. We need to update the cost model so as to account for reader arguments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2866) Incorrect error message reporting schema change when streaming aggregation and hash join are disabled
[ https://issues.apache.org/jira/browse/DRILL-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555226#comment-14555226 ] Sean Hsuan-Yi Chu commented on DRILL-2866: -- [~vicky] It is not reproducible. Do you have a view on top of the data source ? Incorrect error message reporting schema change when streaming aggregation and hash join are disabled - Key: DRILL-2866 URL: https://issues.apache.org/jira/browse/DRILL-2866 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.9.0 Reporter: Victoria Markman Assignee: Sean Hsuan-Yi Chu Attachments: t1.parquet, t2.parquet alter session set `planner.enable_streamagg` = false; alter session set `planner.enable_hashjoin` = false; {code} 0: jdbc:drill:schema=dfs select t1.a1, . . . . . . . . . . . . t1.b1, . . . . . . . . . . . . count(distinct t1.c1) as distinct_c1, . . . . . . . . . . . . count(distinct t2.c2) as distinct_c2, . . . . . . . . . . . . sum(t1.a1) as sum_a1, . . . . . . . . . . . . count(t1.c1) as count_a1, . . . . . . . . . . . . count(*) as count_star . . . . . . . . . . . . from . . . . . . . . . . . . t1, . . . . . . . . . . . . t2 . . . . . . . . . . . . where . . . . . . . . . . . . t1.a1 = t2.a2 and t1.b1 = t2.b2 . . . . . . . . . . . . group by . . . . . . . . . . . . t1.a1, . . . . . . . . . . . . t1.b1, . . . . . . . . . . . . t2.a2, . . . . . . . . . . . . t2.b2 . . . . . . . . . . . . order by . . . . . . . . . . . . t1.a1, . . . . . . . . . . . . t1.b1, . . . . . . . . . . . . t2.a2, . . . . . . . . . . . . t2.b2 . . . . . . . . . . . . ; +++-+-++++ | a1 | b1 | distinct_c1 | distinct_c2 | sum_a1 | count_a1 | count_star | +++-+-++++ Query failed: SYSTEM ERROR: Hash aggregate does not support schema changes Fragment 0:0 [10ee2422-d13c-4405-a4b6-a62358f72995 on atsqa4-134.qa.lab:31010] (org.apache.drill.exec.exception.SchemaChangeException) Hash aggregate does not support schema changes {code} copy/paste reproduction {code} select t1.a1, t1.b1, count(distinct t1.c1) as distinct_c1, count(distinct t2.c2) as distinct_c2, sum(t1.a1) as sum_a1, count(t1.c1) as count_a1, count(*) as count_star from t1, t2 where t1.a1 = t2.a2 and t1.b1 = t2.b2 group by t1.a1, t1.b1, t2.a2, t2.b2 order by t1.a1, t1.b1, t2.a2, t2.b2 ; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3167) When a query fails, Foreman should wait for all fragments to finish cleaning up before sending a FAILED state to the client
Deneche A. Hakim created DRILL-3167: --- Summary: When a query fails, Foreman should wait for all fragments to finish cleaning up before sending a FAILED state to the client Key: DRILL-3167 URL: https://issues.apache.org/jira/browse/DRILL-3167 Project: Apache Drill Issue Type: Bug Reporter: Deneche A. Hakim Assignee: Deneche A. Hakim Fix For: 1.1.0 TestDrillbitResilience.foreman_runTryEnd() exposes this problem intermittently The query fails and the Foreman reports the failure to the client which removes the results listener associated to the failed query. Sometimes, a data batch reaches the client after the FAILED state already arrived, the client doesn't handle this properly and the corresponding buffer is never released. Making the Foreman wait for all fragments to finish before sending the final state should help avoid such scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2866) Incorrect error message reporting schema change when streaming aggregation and hash join are disabled
[ https://issues.apache.org/jira/browse/DRILL-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555267#comment-14555267 ] Victoria Markman commented on DRILL-2866: - Sean, There was a similar bug that was filed by Abhishek. You at a time thought that this one was different. Resolve it as fixed and I will verify it. And no, there was no view. Vicky. Incorrect error message reporting schema change when streaming aggregation and hash join are disabled - Key: DRILL-2866 URL: https://issues.apache.org/jira/browse/DRILL-2866 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.9.0 Reporter: Victoria Markman Assignee: Sean Hsuan-Yi Chu Attachments: t1.parquet, t2.parquet alter session set `planner.enable_streamagg` = false; alter session set `planner.enable_hashjoin` = false; {code} 0: jdbc:drill:schema=dfs select t1.a1, . . . . . . . . . . . . t1.b1, . . . . . . . . . . . . count(distinct t1.c1) as distinct_c1, . . . . . . . . . . . . count(distinct t2.c2) as distinct_c2, . . . . . . . . . . . . sum(t1.a1) as sum_a1, . . . . . . . . . . . . count(t1.c1) as count_a1, . . . . . . . . . . . . count(*) as count_star . . . . . . . . . . . . from . . . . . . . . . . . . t1, . . . . . . . . . . . . t2 . . . . . . . . . . . . where . . . . . . . . . . . . t1.a1 = t2.a2 and t1.b1 = t2.b2 . . . . . . . . . . . . group by . . . . . . . . . . . . t1.a1, . . . . . . . . . . . . t1.b1, . . . . . . . . . . . . t2.a2, . . . . . . . . . . . . t2.b2 . . . . . . . . . . . . order by . . . . . . . . . . . . t1.a1, . . . . . . . . . . . . t1.b1, . . . . . . . . . . . . t2.a2, . . . . . . . . . . . . t2.b2 . . . . . . . . . . . . ; +++-+-++++ | a1 | b1 | distinct_c1 | distinct_c2 | sum_a1 | count_a1 | count_star | +++-+-++++ Query failed: SYSTEM ERROR: Hash aggregate does not support schema changes Fragment 0:0 [10ee2422-d13c-4405-a4b6-a62358f72995 on atsqa4-134.qa.lab:31010] (org.apache.drill.exec.exception.SchemaChangeException) Hash aggregate does not support schema changes {code} copy/paste reproduction {code} select t1.a1, t1.b1, count(distinct t1.c1) as distinct_c1, count(distinct t2.c2) as distinct_c2, sum(t1.a1) as sum_a1, count(t1.c1) as count_a1, count(*) as count_star from t1, t2 where t1.a1 = t2.a2 and t1.b1 = t2.b2 group by t1.a1, t1.b1, t2.a2, t2.b2 order by t1.a1, t1.b1, t2.a2, t2.b2 ; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3165) Sorting a Mongo table should leverage Mongo Indexes
[ https://issues.apache.org/jira/browse/DRILL-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554886#comment-14554886 ] Leandro DG commented on DRILL-3165: --- That's amazing Anil, do you have an expected timeline? We'd love to help beta test. Thanks! Sorting a Mongo table should leverage Mongo Indexes --- Key: DRILL-3165 URL: https://issues.apache.org/jira/browse/DRILL-3165 Project: Apache Drill Issue Type: Improvement Components: Storage - MongoDB Affects Versions: 1.0.0 Reporter: Leandro DG Assignee: B Anil Kumar When doing a query using Mongo, sorting takes place entirely in Drill. Getting the first 1000 rows from a 100 rows table, sorted by a field which has an index takes a long time (about 45 seconds in our test environment). Sample drill query: Select c.name from mongo.foo.json_customers c order by c.name limit 1000 Doing the same in mongo client takes less than a second. Sample mongo query: db.json_customers.find().sort({name:1}).limit(1000) Sorting by a field should leverage the existing mongo indexes if they exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3164) Compilation fails with Java 8
Ted Dunning created DRILL-3164: -- Summary: Compilation fails with Java 8 Key: DRILL-3164 URL: https://issues.apache.org/jira/browse/DRILL-3164 Project: Apache Drill Issue Type: Bug Reporter: Ted Dunning I just got this: {code} ted:drill[1.0.0*]$ mvn package -DskipTests ... Detected JDK Version: 1.8.0-40 is not in the allowed range [1.7,1.8). ... {code} Clearly there is an overly restrictive pattern at work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3165) Sorting a Mongo table should leverage Mongo Indexes
Leandro DG created DRILL-3165: - Summary: Sorting a Mongo table should leverage Mongo Indexes Key: DRILL-3165 URL: https://issues.apache.org/jira/browse/DRILL-3165 Project: Apache Drill Issue Type: Improvement Components: Storage - MongoDB Affects Versions: 1.0.0 Reporter: Leandro DG Assignee: B Anil Kumar When doing a query using Mongo, sorting takes place entirely in Drill. Getting the first 1000 rows from a 100 rows table, sorted by a field which has an index takes a long time (about 45 seconds in our test environment). Sample drill query: Select c.name from mongo.foo.json_customers c order by c.name limit 1000 Doing the same in mongo client takes less than a second. Sample mongo query: db.json_customers.find().sort({name:1}).limit(1000) Sorting by a field should leverage the existing mongo indexes if they exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3165) Sorting a Mongo table should leverage Mongo Indexes
[ https://issues.apache.org/jira/browse/DRILL-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554634#comment-14554634 ] B Anil Kumar commented on DRILL-3165: - Hi Leandro, As of now, operator pushdown is not implemented in mongo storage plugin. We are working on group, sort, limit operators pushdown, with this it should be fast. Sorting a Mongo table should leverage Mongo Indexes --- Key: DRILL-3165 URL: https://issues.apache.org/jira/browse/DRILL-3165 Project: Apache Drill Issue Type: Improvement Components: Storage - MongoDB Affects Versions: 1.0.0 Reporter: Leandro DG Assignee: B Anil Kumar When doing a query using Mongo, sorting takes place entirely in Drill. Getting the first 1000 rows from a 100 rows table, sorted by a field which has an index takes a long time (about 45 seconds in our test environment). Sample drill query: Select c.name from mongo.foo.json_customers c order by c.name limit 1000 Doing the same in mongo client takes less than a second. Sample mongo query: db.json_customers.find().sort({name:1}).limit(1000) Sorting by a field should leverage the existing mongo indexes if they exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3152) Apache Drill 1.0 not able to query MongoDB 3.0.
[ https://issues.apache.org/jira/browse/DRILL-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554621#comment-14554621 ] B Anil Kumar commented on DRILL-3152: - Hi, I just verified drill-1.0.0 with mongo v3.0.3 and it is working fine. Can you please verify by mentioning the db name in the query as below? {noformat} BANL122d28a3e:drill-latest abatchu$ bin/sqlline -u jdbc:drill:zk=localhost:2181 -n admin -p admin apache drill 1.0.0 just drill it 0: jdbc:drill:zk=localhost:2181 SELECT * FROM mongo.employee.`empinfo` limit 10; +--+---+-+--+--+++-+ | employee_id | full_name | first_name | last_name | position_id | position | isFTE | salary | +--+---+-+--+--+++-+ | 1101 | Steve Eurich | Steve | Eurich | 16 | Store T| true | 20.0| | 1102 | Mary Pierson | Mary| Pierson | 16 | Store T| true | 30.0| | 1103 | Leo Jones | Leo | Jones| 16 | Store Tem | true | 10.0| | 1104 | Nancy Beatty | Nancy | Beatty | 16 | Store T| false | 40.0| | 1105 | Clara McNight | Clara | McNight | 16 | Store | true | 50.0| | 1106 | Marcella Isaacs | Marcella| Isaacs | 17 | Stor | false | 120.0 | | 1107 | Charlotte Yonce | Charlotte | Yonce| 17 | Stor | true | 120.0 | | 1108 | Benjamin Foster | Benjamin| Foster | 17 | Stor | false | 22.04 | | 1109 | John Reed | John| Reed | 17 | Store Per | false | 60.0| | 1110 | Lynn Kwiatkowski | Lynn| Kwiatkowski | 17 | St | true | 80.0| +--+---+-+--+--+++-+ 10 rows selected (0.175 seconds) 0: jdbc:drill:zk=localhost:2181 {noformat} Apache Drill 1.0 not able to query MongoDB 3.0. Key: DRILL-3152 URL: https://issues.apache.org/jira/browse/DRILL-3152 Project: Apache Drill Issue Type: Bug Components: Storage - MongoDB Affects Versions: 0.9.0, 1.0.0 Environment: The environment is as follows: Windows 7 MongoDB 3 Wiredtiger (installed locally) Apache Drill 1.0 (installed locally) Reporter: Trent Telfer Assignee: B Anil Kumar Labels: mongodb, mongodb3, windows7, wiredtiger I have been trying to get Apache Drill 1.0, and previously 0.9 to work with MongoDB 3.0 Wiredtiger. I have no problem starting Apache Drill using the following, but I am having problems querying MongoDB: *./sqlline.bat* *!connect jdbc:drill:zk=local* *SHOW DATABASES;* +-+ | SCHEMA_NAME | +-+ | INFORMATION_SCHEMA | | cp.default | | dfs.default | | dfs.root| | dfs.tmp | | mongo.admin | | mongo.alliance_db | | mongo.local | | sys | +-+ *USE mongo.alliance_db;* +---++ | ok |summary | +---++ | true | Default schema changed to [mongo.alliance_db] | +---++ 1 row selected (0.116 seconds) *SELECT * FROM price_daily_ngi;* May 20, 2015 11:14:40 AM org.apache.calcite.sql.validate.SqlValidatorException init SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 'price_daily_ngi' not found May 20, 2015 11:14:40 AM org.apache.calcite.runtime.CalciteException init SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 15 to line 1, column 29: Table 'price_daily_ngi' not found Error: PARSE ERROR: From line 1, column 15 to line 1, column 29: Table 'price_daily_ngi' not found [Error Id: 6414a69d-55a0-4918-8f95-10a920e4dc6b on PCV:31010] (state=,code=0) MongoDB storage configuration: { type: mongo, connection: mongodb://localhost:27017, enabled: true } The collection price_daily_ngi exists and works with normal MongoDB queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)