[jira] [Created] (DRILL-7193) Integration changes of the Distributed RM queue configuration with Simple Parallelizer.
Hanumath Rao Maduri created DRILL-7193: -- Summary: Integration changes of the Distributed RM queue configuration with Simple Parallelizer. Key: DRILL-7193 URL: https://issues.apache.org/jira/browse/DRILL-7193 Project: Apache Drill Issue Type: Sub-task Components: Query Planning Optimization Affects Versions: 1.17.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Fix For: 1.17.0 Refactoring fragment generation code for the RM to accommodate non RM, ZK based queue RM and Distributed RM. Calling the Distributed RM for queue selection based on memory requirements. Adjustment of the operator memory based on the memory limits of the selected queue. Setting of the optimal memory allocation per operator in each minor fragment. This shows up in the query profile. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7191) Distributed state persistence and Integration of Distributed queue configuration with Planner
Hanumath Rao Maduri created DRILL-7191: -- Summary: Distributed state persistence and Integration of Distributed queue configuration with Planner Key: DRILL-7191 URL: https://issues.apache.org/jira/browse/DRILL-7191 Project: Apache Drill Issue Type: Sub-task Components: Server, Query Planning Optimization Affects Versions: 1.17.0 Reporter: Hanumath Rao Maduri Fix For: 1.17.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7164) KafkaFilterPushdownTest is sometimes failing to pattern match correctly.
Hanumath Rao Maduri created DRILL-7164: -- Summary: KafkaFilterPushdownTest is sometimes failing to pattern match correctly. Key: DRILL-7164 URL: https://issues.apache.org/jira/browse/DRILL-7164 Project: Apache Drill Issue Type: Bug Components: Storage - Kafka Affects Versions: 1.16.0 Reporter: Hanumath Rao Maduri Assignee: Abhishek Ravi Fix For: 1.17.0 On my private build I am hitting kafka storage tests issue intermittently. Here is the issue which I came across. {code} at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91] 15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest) java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : { "topicName" : "drill-pushdown-topic" }, .* .* "cost" in plan: { "head" : { "version" : 1, "generator" : { "type" : "ExplainHandler", "info" : "" }, "type" : "APACHE_DRILL_PHYSICAL", "options" : [ { "kind" : "STRING", "accessibleScopes" : "ALL", "name" : "store.kafka.record.reader", "string_val" : "org.apache.drill.exec.store.kafka.decoders.JsonMessageReader", "scope" : "SESSION" }, { "kind" : "BOOLEAN", "accessibleScopes" : "ALL", "name" : "exec.errors.verbose", "bool_val" : true, "scope" : "SESSION" }, { "kind" : "LONG", "accessibleScopes" : "ALL", "name" : "store.kafka.poll.timeout", "num_val" : 5000, "scope" : "SESSION" }, { "kind" : "LONG", "accessibleScopes" : "ALL", "name" : "planner.width.max_per_node", "num_val" : 2, "scope" : "SESSION" } ], "queue" : 0, "hasResourcePlan" : false, "resultMode" : "EXEC" }, "graph" : [ { "pop" : "kafka-scan", "@id" : 6, "userName" : "", "kafkaStoragePluginConfig" : { "type" : "kafka", "kafkaConsumerProps" : { "bootstrap.servers" : "127.0.0.1:56524", "group.id" : "drill-test-consumer" }, "enabled" : true }, "columns" : [ "`**`", "`kafkaMsgOffset`" ], "kafkaScanSpec" : { "topicName" : "drill-pushdown-topic" }, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 5.0 } }, { "pop" : "project", "@id" : 5, "exprs" : [ { "ref" : "`T23¦¦**`", "expr" : "`**`" }, { "ref" : "`kafkaMsgOffset`", "expr" : "`kafkaMsgOffset`" } ], "child" : 6, "outputProj" : false, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 5.0 } }, { "pop" : "filter", "@id" : 4, "child" : 5, "expr" : "equal(`kafkaMsgOffset`, 9) ", "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 0.75 } }, { "pop" : "selection-vector-remover", "@id" : 3, "child" : 4, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } }, { "pop" : "project", "@id" : 2, "exprs" : [ { "ref" : "`T23¦¦**`", "expr" : "`T23¦¦**`" } ], "child" : 3, "outputProj" : false, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } }, { "pop" : "project", "@id" : 1, "exprs" : [ { "ref" : "`**`", "expr" : "`T23¦¦**`" } ], "child" : 2, "outputProj" : true, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } }, { "pop" : "screen", "@id" : 0, "child" : 1, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } } ] }! {code} In the earlier checkin there is a change in the way cost is being represented. This has the changed the test which I think is not right. The pattern to compare in the plan should be made smart to fix this issue generically. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7118) Filter not getting pushed down on MapR-DB tables.
Hanumath Rao Maduri created DRILL-7118: -- Summary: Filter not getting pushed down on MapR-DB tables. Key: DRILL-7118 URL: https://issues.apache.org/jira/browse/DRILL-7118 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.15.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Fix For: 1.16.0 A simple is null filter is not being pushed down for the mapr-db tables. Here is the repro for the same. {code:java} 0: jdbc:drill:zk=local> explain plan for select * from dfs.`/tmp/js` where b is null; ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.1ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.1ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.1ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.1+--+--+ | text | json | +--+--+ | 00-00 Screen 00-01 Project(**=[$0]) 00-02 Project(T0¦¦**=[$0]) 00-03 SelectionVectorRemover 00-04 Filter(condition=[IS NULL($1)]) 00-05 Project(T0¦¦**=[$0], b=[$1]) 00-06 Scan(table=[[dfs, /tmp/js]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=/tmp/js, condition=null], columns=[`**`, `b`], maxwidth=1]]) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7113) Issue with filtering null values from MapRDB-JSON
Hanumath Rao Maduri created DRILL-7113: -- Summary: Issue with filtering null values from MapRDB-JSON Key: DRILL-7113 URL: https://issues.apache.org/jira/browse/DRILL-7113 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.15.0 Reporter: Hanumath Rao Maduri Assignee: Aman Sinha Fix For: 1.16.0, 1.17.0 When the Drill is querying documents from MapRDBJSON that contain fields with null value, it returns the wrong result. The issue is locally reproduced. Please find the repro steps: [1] Create a MaprDBJSON table. Say '/tmp/dmdb2/'. [2] Insert the following sample records to table: {code:java} insert --table /tmp/dmdb2/ --value '{"_id": "1", "label": "person", "confidence": 0.24}' insert --table /tmp/dmdb2/ --value '{"_id": "2", "label": "person2"}' insert --table /tmp/dmdb2/ --value '{"_id": "3", "label": "person3", "confidence": 0.54}' insert --table /tmp/dmdb2/ --value '{"_id": "4", "label": "person4", "confidence": null}' {code} We can see that for field 'confidence' document 1 has value 0.24, document 3 has value 0.54, document 2 does not have the field and document 4 has the field with value null. [3] Query the table from DRILL. *Query 1:* {code:java} 0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2; +--+-+ | label | confidence | +--+-+ | person | 0.24| | person2 | null| | person3 | 0.54| | person4 | null| +--+-+ 4 rows selected (0.2 seconds) {code} *Query 2:* {code:java} 0: jdbc:drill:> select * from dfs.tmp.dmdb2; +--+-+--+ | _id | confidence | label | +--+-+--+ | 1| 0.24| person | | 2| null| person2 | | 3| 0.54| person3 | | 4| null| person4 | +--+-+--+ 4 rows selected (0.174 seconds) {code} *Query 3:* {code:java} 0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence is not null; +--+-+ | label | confidence | +--+-+ | person | 0.24| | person3 | 0.54| | person4 | null| +--+-+ 3 rows selected (0.192 seconds) {code} *Query 4:* {code:java} 0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence is null; +--+-+ | label | confidence | +--+-+ | person2 | null| +--+-+ 1 row selected (0.262 seconds) {code} As you can see, Query 3 which queries for all documents with confidence value 'is not null', returns a document with null value. *Other observation:* Querying the same data using DRILL without MapRDB provides the correct result. For example, create 4 different JSON files with following data: {"label": "person", "confidence": 0.24} \{"label": "person2"} \{"label": "person3", "confidence": 0.54} \{"label": "person4", "confidence": null} Query it directly using DRILL: *Query 5:* {code:java} 0: jdbc:drill:> select label,confidence from dfs.tmp.t2; +--+-+ | label | confidence | +--+-+ | person4 | null| | person3 | 0.54| | person2 | null| | person | 0.24| +--+-+ 4 rows selected (0.203 seconds) {code} *Query 6:* {code:java} 0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is null; +--+-+ | label | confidence | +--+-+ | person4 | null| | person2 | null| +--+-+ 2 rows selected (0.352 seconds) {code} *Query 7:* {code:java} 0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is not null; +--+-+ | label | confidence | +--+-+ | person3 | 0.54| | person | 0.24| +--+-+ 2 rows selected (0.265 seconds) {code} As seen in query 6 & 7, it returns the correct result. I believe the issue is at the MapRDB layer where it is fetching the results. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7068) Support of memory adjustment framework for resource management with Queues
Hanumath Rao Maduri created DRILL-7068: -- Summary: Support of memory adjustment framework for resource management with Queues Key: DRILL-7068 URL: https://issues.apache.org/jira/browse/DRILL-7068 Project: Apache Drill Issue Type: Sub-task Components: Query Planning Optimization Affects Versions: 1.16.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Add support for memory adjustment framework based on queue configuration for a query. It also addresses the re-factoring the existing queue based resource management in Drill. For more details on the design please refer to the parent JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.
Hanumath Rao Maduri created DRILL-6997: -- Summary: Semijoin is changing the join ordering for some tpcds queries. Key: DRILL-6997 URL: https://issues.apache.org/jira/browse/DRILL-6997 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.15.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Fix For: 1.16.0 TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join disabled at scale factor 100. It runs 100% slower at scale factor 1000. This issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. DRILL-6798: Planner changes to support semi-join. {code:java} with ws_wh as (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2 from web_sales ws1,web_sales ws2 where ws1.ws_order_number = ws2.ws_order_number and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) [_LIMITA] select [_LIMITB] count(distinct ws_order_number) as "order count" ,sum(ws_ext_ship_cost) as "total shipping cost" ,sum(ws_net_profit) as "total net profit" from web_sales ws1 ,date_dim ,customer_address ,web_site where d_date between '[YEAR]-[MONTH]-01' and (cast('[YEAR]-[MONTH]-01' as date) + 60 days) and ws1.ws_ship_date_sk = d_date_sk and ws1.ws_ship_addr_sk = ca_address_sk and ca_state = '[STATE]' and ws1.ws_web_site_sk = web_site_sk and web_company_name = 'pri' and ws1.ws_order_number in (select ws_order_number from ws_wh) and ws1.ws_order_number in (select wr_order_number from web_returns,ws_wh where wr_order_number = ws_wh.ws_order_number) order by count(distinct ws_order_number) [_LIMITC]; {code} I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join disabled. Both are executed with commit id 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100. The plan with semi-join enabled has moved the first hash join: and ws1.ws_order_number in (select ws_order_number from ws_wh) It used to be on the build side of the first HJ on the left hand side (04-05). It is now on the build side of the fourth HJ on the left hand side (01-13). The plan with semi-join enabled has a hash_partition_sender (operator 05-00) that takes 10 seconds to execute. But all the fragments take about the same amount of time. The plan with semi-join enabled has two HJ that process 1B rows while the plan with semi-joins disabled has one HJ that processes 1B rows. The plan with semi-join enabled has several senders and receivers that wait more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 17-00). When disabled, there is no operator waiting more than 10 seconds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [VOTE] Apache Drill release 1.15.0 - RC2
- Downloaded tarball and also built from source from [3] - Tried on my Mac - Ran unit tests. LGTM (+1) On Thu, Dec 27, 2018 at 4:45 PM Khurram Faraaz wrote: > Downloaded binaries and deployed on a 4 node CentOS 7.5 cluster. > Executed basic SQL queries > - from sqlline > - from web UI > - and from POSTMAN > > Verified Web UI, performed sanity tests. > > Looks good. > Here is one question related to querying the new sys.functions system > table. > The function names in the name column of sys.functions table in some cases, > are the operators, is this expected behavior, or should that column have > actual names and not the operators. > > 0: jdbc:drill:schema=dfs.tmp> select distinct name from sys.functions limit > 12; > ++ > | name | > ++ > | != | > | $sum0 | > | && | > | - | > | /int | > | < | > | <= | > | <> | > | = | > | == | > | > | > | >= | > ++ > 12 rows selected (0.175 seconds) > > On Thu, Dec 27, 2018 at 3:02 PM Kunal Khatua wrote: > > > - Downloaded tarball and also built from source > > - Tried on CentOS 7.5 against MapR profile > > - Ran a couple of queries consisting of TPCH dataset in Parquet format > > - WebUX interactions seem clean and without any apparent issue. > > > > +1 (binding) > > > > Thanks > > Kunal > > On 12/27/2018 2:37:05 PM, Boaz Ben-Zvi wrote: > > -- Verified gpg signature on source and binaries. > > > > -- Checked the checksum sha512 - matched. > > > > -- Downloaded source to Linux VM - full build and unit tests passed. > > > > -- On the Mac - Build and unit tests passed, except the > > `drill_derby_test` in the `contrib/storage-jdbc` which also fails for > > 1.14.0 on my Mac (so it is a local environment issue). > > > > -- Manually ran on both Mac and Linux, and checked the Web-UI: All my > > `semijoin` tests, and memory spilling tests for hash-join and hash-aggr. > > And a select number of large queries. All passed OK. > > > > ==> +1 (binding) > > > > Thanks, > > > > Boaz > > > > On 12/27/18 12:54 PM, Abhishek Girish wrote: > > > +1 > > > > > > - Brought up Drill in distributed mode on a 4 node cluster with MapR > > > platform - looks good! > > > - Ran regression tests from [6] - looks good! > > > - Ran unit tests with default & mapr profile - looks good! > > > - Basic sanity tests on Sqlline, Web UI - looks good! > > > > > > [6] > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mapr_drill-2Dtest-2Dframework=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=7tE7GD3UydzyDZaH_H0xw7V_m-XWe0tj8frqvjH2h7w=Q8PqbATc4VPUWvGcy_V_7iSQu9uyi1iCqLV5v1Mg31k= > > > > > > On Thu, Dec 27, 2018 at 11:12 AM Aman Sinha wrote: > > > > > >> - Downloaded source from [3] onto my Linux VM, built and ran unit > > tests. I > > >> had to run some test suites individually but got a clean run. > > >> - Verified extraneous directory issue (DRILL-6916) is resolved > > >> - Built the source using MapR profile and ran the secondary indexing > > tests > > >> within mapr format plugin > > >> - Downloaded binary tar ball from [3] on my Mac. Verified checksum of > > the > > >> file using shasum -a 512 *file *and comparing with the one on [3] > > >> - Verified Vitalii's signature through the following command: gpg > > --verify > > >> Downloads/apache-drill-1.15.0.tar.gz.asc apache-drill-1.15.0.tar.gz > > >> - Ran Drill in embedded mode and ran a few TPC-H queries. Checked > query > > >> profiles through Web UI > > >> > > >> LGTM. +1 > > >> > > >> Aman > > >> > > >> On Thu, Dec 27, 2018 at 6:17 AM Denys Ordynskiy > > >> wrote: > > >> > > >>> - downloaded source code, successfully built Drill with mapr profile; > > >>> - run Drill in distributed mode on Ubuntu on JDK8; > > >>> - connected from Drill Explorer, explored data on S3 and MapRFS > > storage; > > >>> - submitted some tests for Drill Web UI and Drill Rest API. > > >>> > > >>> +1 > > >>> > > >>> On Wed, Dec 26, 2018 at 8:40 PM Arina Ielchiieva > > >> wrote: > > Build from source on Linux, started in embedded mode, ran random > > >>> queries. > > Downloaded tarball on Windows, started Drill in embedded mode, run > > >> random > > queries. > > Check Web UI: Profiles, Options, Plugins sections. > > > > Additionally checked: > > - information_schema files table; > > - new SqlLine version; > > - JDBC using Squirrel; > > - ODBC using Drill Explorer; > > - return result set option. > > > > +1 (binding) > > > > Kind regards, > > Arina > > > > On Wed, Dec 26, 2018 at 8:32 PM Volodymyr Vysotskyi > > >>> volody...@apache.org> > > wrote: > > > > > - Downloaded built tar, checked signatures and hashes for built and > > source > > > tars > > > and for jars; > > > - run Drill in embedded mode on both Ubuntu and Windows on JDK8 and > > JDK11; > > > - created views, submitted random TPCH queries from UI and
[jira] [Created] (DRILL-6844) Query with ORDER BY DESC on indexed column does not pick secondary index
Hanumath Rao Maduri created DRILL-6844: -- Summary: Query with ORDER BY DESC on indexed column does not pick secondary index Key: DRILL-6844 URL: https://issues.apache.org/jira/browse/DRILL-6844 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.14.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Query with ORDER BY DESC on indexed column does not pick secondary index {noformat} // Query that uses the secondary index defined on ts. 0: jdbc:drill:schema=dfs.tmp> explain plan for . . . . . . . . . . . . . . > select ts from dfs.`/c8/test3` order by ts limit 1; +--+--+ | text | json | +--+--+ | 00-00 Screen 00-01 Project(ts=[$0]) 00-02 SelectionVectorRemover 00-03 Limit(fetch=[1]) 00-04 Scan(table=[[dfs, /c8/test3]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///c8/test3, condition=null, indexName=ts], columns=[`ts`], limit=1, maxwidth=125]]) {noformat} // Same query with ORDER BY ts DESC does not use the secondary index defined on ts. 0: jdbc:drill:schema=dfs.tmp> explain plan for . . . . . . . . . . . . . . > select ts from dfs.`/c8/test3` order by ts desc limit 1; +--+--+ | text | json | +--+--+ | 00-00 Screen 00-01 Project(ts=[$0]) 00-02 SelectionVectorRemover 00-03 Limit(fetch=[1]) 00-04 SingleMergeExchange(sort0=[0 DESC]) 01-01 OrderedMuxExchange(sort0=[0 DESC]) 02-01 SelectionVectorRemover 02-02 Limit(fetch=[1]) 02-03 SelectionVectorRemover 02-04 TopN(limit=[1]) 02-05 HashToRandomExchange(dist0=[[$0]]) 03-01 Scan(table=[[dfs, /c8/test3]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///c8/test3, condition=null], columns=[`ts`], maxwidth=8554]]) {noformat} { noformat} Index definition is, maprcli table index list -path /c8/test3 -json { "timestamp":1538066303932, "timeofday":"2018-09-27 04:38:23.932 GMT+ PM", "status":"OK", "total":2, "data":[ { "cluster":"c8", "type":"maprdb.si", "indexFid":"2176.68.131294", "indexName":"ts", "hashed":false, "indexState":"REPLICA_STATE_REPLICATING", "idx":1, "indexedFields":"ts:ASC", "isUptodate":false, "minPendingTS":1538066077, "maxPendingTS":1538066077, "bytesPending":0, "putsPending":0, "bucketsPending":1, "copyTableCompletionPercentage":100, "numTablets":32, "numRows":80574368, "totalSize":4854052160 }, { "cluster":"c8", "type":"maprdb.si", "indexFid":"2176.72.131302", "indexName":"ts_desc", "hashed":false, "indexState":"REPLICA_STATE_REPLICATING", "idx":2, "indexedFields":"ts:DESC", "isUptodate":false, "minPendingTS":1538066077, "maxPendingTS":1538066077, "bytesPending":0, "putsPending":0, "bucketsPending":1, "copyTableCompletionPercentage":100, "numTablets":32, "numRows":80081344, "totalSize":4937154560 } ] } {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [ANNOUNCE] New Committer: Hanumath Rao Maduri
Thank you all for the wishes! Thanks, -Hanu On Thu, Nov 1, 2018 at 1:28 PM Chunhui Shi wrote: > Congratulations Hanu! > -- > From:Arina Ielchiieva > Send Time:2018 Nov 1 (Thu) 06:05 > To:dev ; user > Subject:[ANNOUNCE] New Committer: Hanumath Rao Maduri > > The Project Management Committee (PMC) for Apache Drill has invited > Hanumath > Rao Maduri to become a committer, and we are pleased to announce that he > has accepted. > > Hanumath became a contributor in 2017, making changes mostly in the Drill > planning side, including lateral / unnest support. He is also one of the > contributors of index based planning and execution support. > > Welcome Hanumath, and thank you for your contributions! > > - Arina > (on behalf of Drill PMC) >
[Agenda] Drill developer meetup 2019
Drill Developers, I am quite excited to announce the details of the Drill developers day 2019. I have consolidated the topics from our earlier discussions and prioritized them according to the votes. MapR has offered to host it on Nov 14th in Training room downstairs. Here is the exact location Training Room at 4555 Great America Pkwy, Suite 201, Santa Clara, CA, 95054. Please find the agenda for the meetup. *Lunch starts at 12:00PM.* *[12:25 - 12:40] Welcome * - Recap on last year's activities - Preview of this year's focus *[12:40 - 1:00] Storage plugins* - Adding new storage plugins for the following: - Netflix Iceberg, Kudu(some code already exists), Cassandra, Elasticsearch, Carbondata, ORC/XML file formats, Spark RDD/DataFrames/Datasets, Graph databases & more - Improving documentation related to Storage plugins *[1:00 - 1:45] Schema discovery & Evolution* - Creation, management of schema - Handling schema changes in certain common cases - Handling NULL values elegantly - Schema learning (similar to MSGpack plugin) - Query hints *[1:45 - 2:30] Metadata Management* - Defining an abstraction layer for various types of metadata: views, schema, statistics, security - Underlying storage for metadata: what are the options and their trade-offs? - Hive metastore - Parquet metadata cache (parquet specific for row group metadata) - Ease of using the parquet files generated by other engines (like spark) *[2:30 - 2:45] Break* *[2:45 - 4:00] Resource management* - Resource limits per query - Optimal memory assignment for blocking operators based on stats - Enhancing the blocking and exchange operators to live within memory limits - Aligning with admission control/queueing (YARN concepts) - Query scheduling based on queues using tagging and costing - Drill on kubernetes *[4:00 - 4:20] Apache Arrow* - Benefits of integrating Apache Drill with Apache Arrow - Possible trade-offs & implementation hurdles *[4:20 - 4:40] **Performance Improvements* - Efficient handling of Broadcast/Semi/Anti Semi join - Drill Statistics handling - Optimizing complex Parquet reader Thanks, -Hanu
Re: [ANNOUNCE] New Committer: Gautam Parai
Congratulations Gautam! On Mon, Oct 22, 2018 at 8:46 AM salim achouche wrote: > Congrats Gautam! > > On Mon, Oct 22, 2018 at 7:25 AM Arina Ielchiieva wrote: > > > The Project Management Committee (PMC) for Apache Drill has invited > Gautam > > Parai to become a committer, and we are pleased to announce that he has > > accepted. > > > > Gautam has become a contributor since 2016, making changes in various > Drill > > areas including planning side. He is also one of the contributors of the > > upcoming feature to support index based planning and execution. > > > > Welcome Gautam, and thank you for your contributions! > > > > - Arina > > (on behalf of Drill PMC) > > > > > -- > Regards, > Salim >
Re: Topics for Drill Hackathon/Drill Developers Day - 2018!
Hello All, Please vote for the list of the topics which you would be interested in. This will be very helpful to prioritize the topics on Developer Day. https://docs.google.com/forms/d/1C8nNIznllct_zY68R-XZkWtb3VHWBMFSkMnDs42isLs/edit On Wed, Oct 17, 2018 at 9:45 PM Hanumath Rao Maduri wrote: > Hello Charles, > > Thank you for your interest to volunteer. We are planning to host a remote > session as well. > I have added your name as a volunteer to Storage plugins, REST APIs > related enhancements discussion. > > > On Tue, Oct 16, 2018 at 3:53 PM Charles Givre wrote: > >> @All, >> I don’t know if remote folks can host a session, but if so, I’d volunteer. >> — C >> >> > On Oct 16, 2018, at 17:13, Vitalii Diravka wrote: >> > >> > Yes, I can edit and post suggestions in the document. >> > Thank you! >> > >> > On Tue, Oct 16, 2018 at 11:50 PM Hanumath Rao Maduri < >> hanu@gmail.com> >> > wrote: >> > >> >> Hello Vitalli, >> >> >> >> I have given permissions to edit the document. Please let me know if >> it is >> >> fine. >> >> >> >> Regards, >> >> -Hanu >> >> >> >> On Tue, Oct 16, 2018 at 11:10 AM Vitalii Diravka >> >> wrote: >> >> >> >>> Could you provide the possibility of commenting for the document? >> >>> It will allow to make suggestions for the topics. >> >>> >> >>> On Tue, Oct 16, 2018 at 6:22 AM Hanumath Rao Maduri < >> hanu@gmail.com> >> >>> wrote: >> >>> >> >>>> Hello Drill Development Team, >> >>>> >> >>>> Thank you all for the interest in attending the Drill Developers Day. >> >>>> I have curated a list of topics that can be discussed at the >> up-coming >> >>>> Drill Developers Day. Please feel free to suggest any other topics >> >> which >> >>>> you are interested in. Here is the link for the topics. >> >>>> >> >>>> >> >>>> >> >>> >> >> >> https://docs.google.com/document/d/1x9v_3UdENotONSuLm93hQJ-pDu1GS5tAhbXOaJrelsw/edit?usp=sharing >> >>>> >> >>>> Volunteers to lead the discussions are welcome. Please pick any topic >> >> of >> >>>> your interest to volunteer the discussion. >> >>>> >> >>>> Agenda and format for the discussions will be shared as we get closer >> >> to >> >>>> the event. >> >>>> >> >>>> We all are quite excited to meet you at the event. >> >>>> >> >>>> Thanks, >> >>>> -Hanu >> >>>> >> >>> >> >> >> >>
Re: Topics for Drill Hackathon/Drill Developers Day - 2018!
Hello Charles, Thank you for your interest to volunteer. We are planning to host a remote session as well. I have added your name as a volunteer to Storage plugins, REST APIs related enhancements discussion. On Tue, Oct 16, 2018 at 3:53 PM Charles Givre wrote: > @All, > I don’t know if remote folks can host a session, but if so, I’d volunteer. > — C > > > On Oct 16, 2018, at 17:13, Vitalii Diravka wrote: > > > > Yes, I can edit and post suggestions in the document. > > Thank you! > > > > On Tue, Oct 16, 2018 at 11:50 PM Hanumath Rao Maduri > > > wrote: > > > >> Hello Vitalli, > >> > >> I have given permissions to edit the document. Please let me know if it > is > >> fine. > >> > >> Regards, > >> -Hanu > >> > >> On Tue, Oct 16, 2018 at 11:10 AM Vitalii Diravka > >> wrote: > >> > >>> Could you provide the possibility of commenting for the document? > >>> It will allow to make suggestions for the topics. > >>> > >>> On Tue, Oct 16, 2018 at 6:22 AM Hanumath Rao Maduri < > hanu@gmail.com> > >>> wrote: > >>> > >>>> Hello Drill Development Team, > >>>> > >>>> Thank you all for the interest in attending the Drill Developers Day. > >>>> I have curated a list of topics that can be discussed at the up-coming > >>>> Drill Developers Day. Please feel free to suggest any other topics > >> which > >>>> you are interested in. Here is the link for the topics. > >>>> > >>>> > >>>> > >>> > >> > https://docs.google.com/document/d/1x9v_3UdENotONSuLm93hQJ-pDu1GS5tAhbXOaJrelsw/edit?usp=sharing > >>>> > >>>> Volunteers to lead the discussions are welcome. Please pick any topic > >> of > >>>> your interest to volunteer the discussion. > >>>> > >>>> Agenda and format for the discussions will be shared as we get closer > >> to > >>>> the event. > >>>> > >>>> We all are quite excited to meet you at the event. > >>>> > >>>> Thanks, > >>>> -Hanu > >>>> > >>> > >> > >
Re: Topics for Drill Hackathon/Drill Developers Day - 2018!
Hello Vitalli, I have given permissions to edit the document. Please let me know if it is fine. Regards, -Hanu On Tue, Oct 16, 2018 at 11:10 AM Vitalii Diravka wrote: > Could you provide the possibility of commenting for the document? > It will allow to make suggestions for the topics. > > On Tue, Oct 16, 2018 at 6:22 AM Hanumath Rao Maduri > wrote: > > > Hello Drill Development Team, > > > > Thank you all for the interest in attending the Drill Developers Day. > > I have curated a list of topics that can be discussed at the up-coming > > Drill Developers Day. Please feel free to suggest any other topics which > > you are interested in. Here is the link for the topics. > > > > > > > https://docs.google.com/document/d/1x9v_3UdENotONSuLm93hQJ-pDu1GS5tAhbXOaJrelsw/edit?usp=sharing > > > > Volunteers to lead the discussions are welcome. Please pick any topic of > > your interest to volunteer the discussion. > > > > Agenda and format for the discussions will be shared as we get closer to > > the event. > > > > We all are quite excited to meet you at the event. > > > > Thanks, > > -Hanu > > >
Topics for Drill Hackathon/Drill Developers Day - 2018!
Hello Drill Development Team, Thank you all for the interest in attending the Drill Developers Day. I have curated a list of topics that can be discussed at the up-coming Drill Developers Day. Please feel free to suggest any other topics which you are interested in. Here is the link for the topics. https://docs.google.com/document/d/1x9v_3UdENotONSuLm93hQJ-pDu1GS5tAhbXOaJrelsw/edit?usp=sharing Volunteers to lead the discussions are welcome. Please pick any topic of your interest to volunteer the discussion. Agenda and format for the discussions will be shared as we get closer to the event. We all are quite excited to meet you at the event. Thanks, -Hanu
Re: [ANNOUNCE] New Committer: Chunhui Shi
Congratulations Chunhui. On Fri, Sep 28, 2018 at 9:26 AM Padma Penumarthy wrote: > Congratulations Chunhui. > > Thanks > Padma > > > On Fri, Sep 28, 2018 at 2:17 AM Arina Ielchiieva wrote: > > > The Project Management Committee (PMC) for Apache Drill has invited > Chunhui > > Shi to become a committer, and we are pleased to announce that he has > > accepted. > > > > Chunhui Shi has become a contributor since 2016, making changes in > various > > Drill areas. He has shown profound knowledge in Drill planning side > during > > his work to support lateral join. He is also one of the contributors of > the > > upcoming feature to support index based planning and execution. > > > > Welcome Chunhui, and thank you for your contributions! > > > > - Arina > > (on behalf of Drill PMC) > > >
Re: Problem of adding support for CROSS JOIN syntax
Hello Ihor, I am not clear on the mentioned goal of this JIRA. Can you please clarify on this using some examples. "But main goal of this task is to allow explicit cross joins in queries when option is enabled and at the same time disallow other ways to execute cross joins (for example, list tables via comma in FROM section of query without condition) while option is enabled. " At-least from what I understand regarding the support of cross joins, is to enable the cross join whenever a comma separated syntax without conditions are also enabled. Currently cross join is not supported at all. It's support should be similar to that of comma separated query syntax without conditions. Thanks, -Hanu On Thu, Sep 27, 2018 at 10:14 AM Ihor Huzenko wrote: > Dear Drillers, > > I'm trying to implement support for CROSS JOIN syntax in Apache Drill. > But after long investigation I finally run out of ideas and don't see > proper way > how this could be implemented without changes to Calcite. I'm new to Drill > and > Calcite and I would appreciate any help. Please, take a look at my comment > under the issue https://issues.apache.org/jira/browse/DRILL-786. > > Thank you in advance, Igor Guzenko >
Re: Some questions about sorting pushdown to custom plugins
If I understand it correctly, you may need to write new storage plugin rules (as similar to that of other storage plugin rules) to support projection pushdown and limit pushdown for your custom storage plugin. This might help in reading only the required fields from the storage. Thanks, On Tue, Aug 28, 2018 at 10:19 AM yang zhang wrote: > Hi: > I have two questions to ask. > > There is already a custom plugin. Query sql: > Select * from indexr.face_image_mess where device_id = > '3E04846B-6B69-1A4D-0569-ED0813853348' and org_id in (11,12) order by > short_time desc limit 4 offset 9; > plan: > { > "head" : { > "version" : 1, > "generator" : { > "type" : "DefaultSqlHandler", > "info" : "" > }, > "type" : "APACHE_DRILL_PHYSICAL", > "options" : [ { > "kind" : "LONG", > "accessibleScopes" : "ALL", > "name" : "planner.width.max_per_node", > "num_val" : 1, > "scope" : "SESSION" > } ], > "queue" : 0, > "hasResourcePlan" : false, > "resultMode" : "EXEC" > }, > "graph" : [ { > "pop" : "indexr-scan", > "@id" : 7, > "userName" : "conn", > "indexrScanSpec" : { > "tableName" : "face_image_mess", > "rsFilter" : { > "type" : "and", > "children" : [ { > "type" : "equal", > "attr" : { > "name" : "device_id", > "type" : "VARCHAR" > }, > "numValue" : 0, > "strValue" : "3E04846B-6B69-1A4D-0569-ED0813853348", > "type" : "equal" > }, { > "type" : "or", > "children" : [ { > "type" : "equal", > "attr" : { > "name" : "org_id", > "type" : "VARCHAR" > }, > "numValue" : 0, > "strValue" : "11", > "type" : "equal" > }, { > "type" : "equal", > "attr" : { > "name" : "org_id", > "type" : "VARCHAR" > }, > "numValue" : 0, > "strValue" : "12", > "type" : "equal" > } ], > "type" : "or" > } ], > "type" : "and" > } > }, > "storage" : { > "type" : "indexr", > "enabled" : true > }, > "columns" : [ "`**`" ], > "limitScanRows" : 9223372036854775807, > "scanId" : "dc889ee0-6675-42bb-b7a5-2bc413d6372d", > "cost" : 0.0 > }, { > "pop" : "top-n", > "@id" : 6, > "child" : 7, > "orderings" : [ { > "order" : "DESC", > "expr" : "`short_time`", > "nullDirection" : "FIRST" > } ], > "reverse" : false, > "limit" : 13, > "initialAllocation" : 100, > "maxAllocation" : 100, > "cost" : 1.0 > }, { > "pop" : "selection-vector-remover", > "@id" : 5, > "child" : 6, > "initialAllocation" : 100, > "maxAllocation" : 100, > "cost" : 1.0 > }, { > "pop" : "limit", > "@id" : 4, > "child" : 5, > "first" : 9, > "last" : 13, > "initialAllocation" : 100, > "maxAllocation" : 100, > "cost" : 13.0 > }, { > "pop" : "limit", > "@id" : 3, > "child" : 4, > "first" : 9, > "last" : 13, > "initialAllocation" : 100, > "maxAllocation" : 100, > "cost" : 13.0 > }, { > "pop" : "selection-vector-remover", > "@id" : 2, > "child" : 3, > "initialAllocation" : 100, > "maxAllocation" : 100, > "cost" : 13.0 > }, { > "pop" : "project", > "@id" : 1, > "exprs" : [ { > "ref" : "`id`", > "expr" : "`id`" > } > . > . > . > ], > "child" : 2, > "outputProj" : true, > "initialAllocation" : 100, > "maxAllocation" : 100, > "cost" : 13.0 > }, { > "pop" : "screen", > "@id" : 0, > "child" : 1, > "initialAllocation" : 100, > "maxAllocation" : 100, > "cost" : 13.0 > } ] > } > > > The version of drill-1.13.0 is used. > 1. Is order by pushdown supported? > Does the drill execution plan push down to the plugin when it > supports getting the order by field 'short_time' and the select field (all > fields)? > This table (face_image_mess) has 70 fields, one data has 1KB; this > query hits 2; currently can't get the order by field and select > field pushed down to the plugin, only scan all the field values of the hit; > the calculation of the drill engine is 2 * 1KB = 190.7GB. >Custom plugin I designed a Long type rowid, which points to the physical > address. >If the plugin can get the pushed by order field and the select field to > filter out the fields that do not need to participate in the calculation, > only four fields that actually participate in the calculation are 'rowid', > 'short_time', 'device_id', 'org_id', then Minimize the calculation of the > drill engine. The actual
Re: Apache drill High availability
Unlike other databases or data engines drill doesn't store data in its own storage engine. So high availability of the data when using Drill means the storage engine needs to support high availability. High availability can also mean that when running a query even though some of the fragments/ nodes might fail, query needs to succeed. In this scenario Drill reports an error to the user and expects the user to re-run the query. On re-running the query again, it might be successful (of-course depending upon load of the system). If a node is crashed or not responding then the ZK will not consider the node during planning stage itself. Hence, no fragments are executed on the node (and query performance might degrade in this scenario). Thanks, On Tue, Aug 28, 2018 at 9:34 AM salim achouche wrote: > You need to clarify your definition of HA as there can be multiple faults > at play: > - A Drill cluster can handle nodes going down (and new ones joining the > cluster) > - Though, running queries (which are executed in a distributed manner) > might fail if they had minor-fragments running on a faulty node > - Similarly, Drill has some built-in resilience to network disconnects > albeit it is not always transparent (I believe, queries might fail if > network disconnect happened during a connection exchange) > > Regards, > > On Mon, Aug 27, 2018 at 10:48 PM pujari Satish > wrote: > > > Hi Team, > > > > Good Morning. I am trying to do drill high avilability using Haproxy load > > balancer. > > Is drill supports for high availability ? > > > > > > please let me know in this. > > > > > > -Thanks, > > Satish > > >
Re: [ANNOUNCE] New PMC member: Volodymyr Vysotskyi
Congratulations Volodymyr! Thanks, -Hanu On Fri, Aug 24, 2018 at 10:22 AM Paul Rogers wrote: > Congratulations Volodymyr! > Thanks, > - Paul > > > > On Friday, August 24, 2018, 5:53:25 AM PDT, Arina Ielchiieva < > ar...@apache.org> wrote: > > I am pleased to announce that Drill PMC invited Volodymyr Vysotskyi to the > PMC and he has accepted the invitation. > > Congratulations Vova and thanks for your contributions! > > - Arina > (on behalf of Drill PMC) >
Drill Hangout tomorrow 08/21
The Apache Drill Hangout will be held tomorrow at 10:00am PST; please let us know should you have a topic for tomorrow's hangout. We will also ask for topics at the beginning of the hangout. Hangout Link - https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc Regards, Hanu
Re: [ANNOUNCE] New PMC member: Boaz Ben-Zvi
Congratulations, Boaz! On Fri, Aug 17, 2018 at 10:22 AM Kunal Khatua wrote: > Congratulations, Boaz!! > On 8/17/2018 10:11:32 AM, Paul Rogers wrote: > Congratulations Boaz! > - Paul > > > > On Friday, August 17, 2018, 2:56:27 AM PDT, Vitalii Diravka wrote: > > Congrats Boaz! > > Kind regards > Vitalii > > > On Fri, Aug 17, 2018 at 12:51 PM Arina Ielchiieva wrote: > > > I am pleased to announce that Drill PMC invited Boaz Ben-Zvi to the PMC > and > > he has accepted the invitation. > > > > Congratulations Boaz and thanks for your contributions! > > > > - Arina > > (on behalf of Drill PMC) > > >
[jira] [Created] (DRILL-6671) Multi level lateral unnest join is throwing an exception during materializing the plan.
Hanumath Rao Maduri created DRILL-6671: -- Summary: Multi level lateral unnest join is throwing an exception during materializing the plan. Key: DRILL-6671 URL: https://issues.apache.org/jira/browse/DRILL-6671 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.15.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri testMultiUnnestAtSameLevel in TestE2EUnnestAndLateral is throwing an execution in Materializer.java. This is due to incorrect matching of Unnest and Lateral join. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6645) Transform TopN in Lateral Unnest pipeline to Sort and Limit.
Hanumath Rao Maduri created DRILL-6645: -- Summary: Transform TopN in Lateral Unnest pipeline to Sort and Limit. Key: DRILL-6645 URL: https://issues.apache.org/jira/browse/DRILL-6645 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.14.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Fix For: 1.15.0 TopN operator is not supported in Lateral Unnest pipeline. Hence transform the TopN to use Sort and Limit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6545) Projection Push down into Lateral Join operator.
Hanumath Rao Maduri created DRILL-6545: -- Summary: Projection Push down into Lateral Join operator. Key: DRILL-6545 URL: https://issues.apache.org/jira/browse/DRILL-6545 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.13.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Fix For: 1.14.0 For the Lateral’s logical and physical plan node, we would need to add an output RowType such that a Projection can be pushed down to Lateral. Currently, Lateral will produce all columns from left and right and it depends on a subsequent Project to eliminate unneeded columns. However, this will blow up the memory use of Lateral since each column from the left will be replicated N times based on N rows coming from UNNEST. We can have a ProjectLateralPushdownRule that pushes only the plain columns onto LATERAL but keeps the expression evalulations as part of the Project above the Lateral. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [ANNOUNCE] New PMC member: Vitalii Diravka
Congratulations Vitalii! On Tue, Jun 26, 2018 at 12:27 PM, Gautam Parai wrote: > Congratulations Vitalii! > > Gautam > > On Tue, Jun 26, 2018 at 11:48 AM, Volodymyr Vysotskyi < > volody...@apache.org> > wrote: > > > Congratulations, Vitalii! > > > > Kind regards, > > Volodymyr Vysotskyi > > > > > > вт, 26 черв. 2018 о 21:38 Robert Wu пише: > > > > > Congratulations, Vitalii! > > > > > > Best regards, > > > > > > Rob > > > > > > -Original Message- > > > From: Sorabh Hamirwasia > > > Sent: Tuesday, June 26, 2018 11:30 AM > > > To: dev@drill.apache.org > > > Subject: Re: [ANNOUNCE] New PMC member: Vitalii Diravka > > > > > > Congratulations Vitalii! > > > > > > Thanks, > > > Sorabh > > > > > > On Tue, Jun 26, 2018 at 11:18 AM, Arina Yelchiyeva < > > > arina.yelchiy...@gmail.com> wrote: > > > > > > > Congratulations, Vitalii! Well deserved! > > > > > > > > Kind regards, > > > > Arina > > > > > > > > On Tue, Jun 26, 2018 at 9:16 PM Bridget Bevens > > wrote: > > > > > > > > > Congratulations, Vitalii! > > > > > > > > > > On Tue, Jun 26, 2018 at 11:14 AM, Abhishek Girish > > > > > > > > > > wrote: > > > > > > > > > > > Congratulations, Vitalii! > > > > > > > > > > > > On Tue, Jun 26, 2018 at 11:12 AM Aman Sinha < > amansi...@apache.org> > > > > > wrote: > > > > > > > > > > > > > I am pleased to announce that Drill PMC invited Vitalii Diravka > > > > > > > to > > > > the > > > > > > PMC > > > > > > > and he has accepted the invitation. > > > > > > > > > > > > > > Congratulations Vitalii and thanks for your contributions ! > > > > > > > > > > > > > > -Aman > > > > > > > (on behalf of Drill PMC) > > > > > > > > > > > > > > > > > > > > > > > > > > > >
[jira] [Created] (DRILL-6502) Rename CorrelatePrel to LateralJoinPrel as currently correlatePrel is physical relation for LateralJoin
Hanumath Rao Maduri created DRILL-6502: -- Summary: Rename CorrelatePrel to LateralJoinPrel as currently correlatePrel is physical relation for LateralJoin Key: DRILL-6502 URL: https://issues.apache.org/jira/browse/DRILL-6502 Project: Apache Drill Issue Type: Bug Affects Versions: 1.14.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Fix For: 1.14.0 Currently in Drill correlatePrel is a physical relation operator for LateralJoin implementation. Explain plan shows CorrelatePrel which can be confusing. Hence it is good to rename this operator to LateralJoinPrel. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [ANNOUNCE] New Committer: Padma Penumarthy
Congratulations Padma! On Fri, Jun 15, 2018 at 12:04 PM, Gautam Parai wrote: > Congratulations Padma!! > > > Gautam > > > From: Vlad Rozov > Sent: Friday, June 15, 2018 11:56:37 AM > To: dev@drill.apache.org > Subject: Re: [ANNOUNCE] New Committer: Padma Penumarthy > > Congrats Padma! > > Thank you, > > Vlad > > On 6/15/18 11:38, Charles Givre wrote: > > Congrats Padma!! > > > >> On Jun 15, 2018, at 13:57, Bridget Bevens wrote: > >> > >> Congratulations, Padma!!! > >> > >> > >> From: Prasad Nagaraj Subramanya > >> Sent: Friday, June 15, 2018 10:32:04 AM > >> To: dev@drill.apache.org > >> Subject: Re: [ANNOUNCE] New Committer: Padma Penumarthy > >> > >> Congratulations Padma! > >> > >> Thanks, > >> Prasad > >> > >> On Fri, Jun 15, 2018 at 9:59 AM Vitalii Diravka < > vitalii.dira...@gmail.com> > >> wrote: > >> > >>> Congrats Padma! > >>> > >>> Kind regards > >>> Vitalii > >>> > >>> > >>> On Fri, Jun 15, 2018 at 7:40 PM Arina Ielchiieva > wrote: > >>> > Padma, congratulations and welcome! > > Kind regards, > Arina > > On Fri, Jun 15, 2018 at 7:36 PM Aman Sinha > wrote: > > > The Project Management Committee (PMC) for Apache Drill has invited > >>> Padma > > Penumarthy to become a committer, and we are pleased to announce that > >>> she > > has > > accepted. > > > > Padma has been contributing to Drill for about 1 1/2 years. She has > >>> made > > improvements for work-unit assignment in the parallelizer, > performance > >>> of > > filter operator for pattern matching and (more recently) on the batch > > sizing for several operators: Flatten, MergeJoin, HashJoin, UnionAll. > > > > Welcome Padma, and thank you for your contributions. Keep up the > good > work > > ! > > > > -Aman > > (on behalf of Drill PMC) > > > >
[jira] [Created] (DRILL-6476) Generate explain plan which shows relation between Lateral and the corresponding Unnest.
Hanumath Rao Maduri created DRILL-6476: -- Summary: Generate explain plan which shows relation between Lateral and the corresponding Unnest. Key: DRILL-6476 URL: https://issues.apache.org/jira/browse/DRILL-6476 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.14.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Currently, explain plan doesn't show that which lateral and unnest node's are related. This information is good to have so that the visual plan can use it and show the relation visually. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6456) Planner shouldn't create any exchanges on the right side of Lateral Join.
Hanumath Rao Maduri created DRILL-6456: -- Summary: Planner shouldn't create any exchanges on the right side of Lateral Join. Key: DRILL-6456 URL: https://issues.apache.org/jira/browse/DRILL-6456 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.14.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Fix For: 1.14.0 Currently, there is no restriction placed on right side of the LateralJoin. This is causing planner to generate an Exchange when there are operators like (Agg, Limit, Sort etc). Due to this unnest operator cannot retrieve the row from lateral's left side to process the pipeline further. Enhance the planner to not generate exchanges on the right side of the LateralJoin. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [ANNOUNCE] New Committer: Timothy Farkas
Congratulations Tim! Thanks, -Hanu On Fri, May 25, 2018 at 1:16 PM, Gautam Paraiwrote: > Congratulations Tim! > > > Gautam > > > From: Sorabh Hamirwasia > Sent: Friday, May 25, 2018 12:44:47 PM > To: dev@drill.apache.org > Subject: Re: [ANNOUNCE] New Committer: Timothy Farkas > > Congratulations Tim! > > > Thanks, > Sorabh > > > From: Vova Vysotskyi > Sent: Friday, May 25, 2018 12:43:04 PM > To: dev@drill.apache.org > Subject: Re: [ANNOUNCE] New Committer: Timothy Farkas > > Congratulations, Tim! > > Kind regards, > Volodymyr Vysotskyi > > > пт, 25 трав. 2018 о 22:17 Padma Penumarthy пише: > > > Congrats Tim. > > > > Thanks > > Padma > > > > > > > On May 25, 2018, at 12:15 PM, Vitalii Diravka < > vitalii.dira...@gmail.com> > > wrote: > > > > > > Good news! Congratulations, Timothy! > > > > > > Kind regards > > > Vitalii > > > > > > > > > On Fri, May 25, 2018 at 10:04 PM Arina Yelchiyeva < > > > arina.yelchiy...@gmail.com> wrote: > > > > > >> Congrats, Tim! > > >> > > >> Kind regards, > > >> Arina > > >> > > >>> On May 25, 2018, at 9:59 PM, Kunal Khatua wrote: > > >>> > > >>> Congratulations, Timothy ! > > >>> > > >>> On 5/25/2018 11:58:31 AM, Aman Sinha wrote: > > >>> The Project Management Committee (PMC) for Apache Drill has invited > > >> Timothy > > >>> Farkas to become a committer, and we are pleased to announce that he > > >>> has accepted. > > >>> > > >>> Tim has become an active contributor to Drill in less than a year. > > During > > >>> this time he has contributed to addressing flaky unit tests, fixing > > >> memory > > >>> leaks in certain operators, enhancing the system options framework to > > be > > >>> more extensible and setting up the Travis CI tests. More recently, he > > >>> worked on the memory sizing calculations for hash join. > > >>> > > >>> Welcome Tim, and thank you for your contributions. Keep up the good > > work > > >> ! > > >>> > > >>> -Aman > > >>> (on behalf of Drill PMC) > > >> > > > > >
[jira] [Created] (DRILL-6431) Unnest operator requires table and a single column alias to be specified.
Hanumath Rao Maduri created DRILL-6431: -- Summary: Unnest operator requires table and a single column alias to be specified. Key: DRILL-6431 URL: https://issues.apache.org/jira/browse/DRILL-6431 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization, SQL Parser Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Fix For: 1.14.0 Currently, unnest operator is not required to specify alias neither for table name nor column name. This has some implications on what name the unnest operator output column should use. One can use a common name like "unnest" as the output name. It means, customers need to be educated on what to expect from unnest operator. This might confuse some customers and also prone to introduce errors in the query. The design decision for DRILL is that unnest always produces either a scalar column or a map (depending upon the input schema for it), but it is always a single column. Given this scenario, it is better to enforce the requirement that unnest operator requires a table alias and a column alias(single column). This can help to disambiguate the column and further can easily be referenced in the query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [ANNOUNCE] New Committer: Sorabh Hamirwasia
Congrats Sorabh! Thanks, -Hanu On Mon, Apr 30, 2018 at 12:13 PM, salim achouchewrote: > Congrats Sorabh! well deserved. > > Regards, > Salim > > On Mon, Apr 30, 2018 at 3:35 PM, Aman Sinha wrote: > > > The Project Management Committee (PMC) for Apache Drill has invited > Sorabh > > Hamirwasia to become a committer, and we are pleased to announce that he > > has accepted. > > > > Over the last 1 1/2 years Sorabh's contributions have been in a few > > different areas. He took > > the lead in designing and implementing network encryption support for > > Drill. He has contributed > > to the web server and UI side. More recently, he is involved in design > and > > implementation of the lateral join operator. > > > > Welcome Sorabh, and thank you for your contributions. Keep up the good > > work ! > > > > -Aman > > (on behalf of Drill PMC) > > >
Re: Non-column filters in Drill
Hello Ryan, Thank you for trying out Drill. Drill/Calcite expects "notColumn" to be supplied by the underlying scan. However, I expect that this column will be present in the scan but not past the filter (notColumn = 'value') in the plan. In that case you may need to pushdown the filter to the groupScan and then remove the column projections from your custom groupscan. It would be easy for us to guess what could be the issue, if you can post the logical and physical query plan's for this query. Hope this helps. Please do let us know if you have any further issues. Thanks, On Sat, Apr 7, 2018 at 2:08 PM, Ryan Shankswrote: > Hi Drill Dev Team! > > I am writing a custom storage plugin and I am curious if it is possible in > Drill to pass a filter value, in the form of a where clause, that is not > related to a column. What I would like to accomplish is something like: > > select * from myTable where notColumn = 'value'; > > In the example, notColumn is not a column in myTable, or any other table, > it is just a specific parameter that the storage plugin will use in the > filtering process. Additionally, notColumn would not be returned as a > column so Drill needs to not expect it as a part of the 'select *'. I > created a rule that will push down and remove these non-column filter > calls, but I need to somehow tell drill/calcite that the filter name is > valid, without actually registering it as a column. The following error > occurs prior to submitting any rules: > > org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: > From line 1, column 35 to line 1, column 39: Column 'notColumn' not found > in any table > > > Alternatively, can I manipulate star queries to only return a subset of > all the columns for a table? > > Any insight would be greatly appreciated! > > Thanks, > Ryan >
Re: "Death of Schema-on-Read"
Hello All, I have created a JIRA to track this approach. https://issues.apache.org/jira/browse/DRILL-6312 Thanks, -Hanu On Fri, Apr 6, 2018 at 7:38 PM, Paul Rogerswrote: > Hi Aman, > > As we get into details, I suggested to Hanu that we move the discussion > into a JIRA ticket. > > >On the subject of CAST pushdown to Scans, there are potential drawbacks > > > - In general, the planner will see a Scan-Project where the Project > has CAST functions. But the Project can have arbitrary expressions, e.g > CAST(a as INT) * 5 > > Suggestion: push the CAST(a AS INT) down to the scan, do the a * 5 in the > Project operator. > > > or a combination of 2 CAST functions > > If the user does a two-stage cast, CAST(CAST(a AS INT) AS BIGINT), then > one simple rule is to push only the innermost cast downwards. > > > or non-CAST functions etc. > > Just keep it in Project. > > >It would be quite expensive to examine each expression (there could > be hundreds) to determine whether it is eligible to be pushed to the Scan. > > Just push CAST( AS ). Even that would be a huge win. > Note, for CSV, it might have to be CAST(columns[2] AS INT), since "columns" > is special for CSV. > > > - Expressing Nullability is not possible with CAST. If a column > should be tagged as (not)nullable, CAST syntax does not allow that. > > Can we just add keywords: CAST(a AS INT NULL), CAST(b AS VARCHAR NOT NULL) > ? > > > - Drill currently supports CASTing to a SQL data type, but not to > the complex types such as arrays and maps. We would have to add support > for that from a language perspective as well as the run-time. This would > be non-trivial effort. > > The term "complex type" is always confusing. Consider a map. The rules > would apply recursively to the members of the map. (Problem: today, if I > reference a map member, Drill pulls it to the top level: SELECT m.a creates > a new top-level field, it does not select "a" within "m". We need to fix > that anyway. So, CAST(m.a AS INT) should imply the type of column "a" > within map "m". > > For arrays, the problem is more complex. Perhaps more syntax: CAST(a[] AS > INT) to force array elements to INT. Maybe use CAST(a[][] AS INT) for a > repeated list (2D array). > > Unions don't need a solution as they are their own solution (they can hold > multiple types.) Same for (non-repeated) lists. > > To resolve runs of nulls, maybe allow CAST(m AS MAP). Or we can imply that > "m" is a Map from the expression CAST(m.a AS INT). For arrays, the > previously suggested CAST(a[] AS INT). If columns "a" or "m" turn out to be > a non-null scalar, then we have no good answer. > > CAST cannot solve the nasty cases of JSON in which some fields are > complex, some scalar. E.g. {a: 10} {a: [20]} or {m: "foo"} {m: {value: > "foo"}}. I suppose no solution is perfect... > > I'm sure that, if someone gets a chance to desig this feature, they'll > find lots more issues. Maybe cast push-down is only a partial solution. > But, it seems to solve so many of the JSON and CSV cases that I've seen > that it seems too good to pass up. > > Thanks, > > > - Paul
[jira] [Created] (DRILL-6312) Enable pushing of cast expressions to the scanner for better schema discovery.
Hanumath Rao Maduri created DRILL-6312: -- Summary: Enable pushing of cast expressions to the scanner for better schema discovery. Key: DRILL-6312 URL: https://issues.apache.org/jira/browse/DRILL-6312 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators, Query Planning Optimization Affects Versions: 1.13.0 Reporter: Hanumath Rao Maduri Drill is a schema less engine which tries to infer the schema from disparate sources at the read time. Currently the scanners infer the schema for each batch depending upon the data for that column in the corresponding batch. This solves many uses cases but can error out when the data is too different between batches like int and array[int] etc... (There are other cases as well but just to give one example). There is also a mechanism to create a view by type casting the columns to appropriate type. This solves issues in some cases but fails in many other cases. This is due to the fact that cast expression is not being pushed down to the scanner but staying at the project or filter etc operators up the query plan. This JIRA is to fix this by propagating the type information embedded in the cast function to the scanners so that scanners can cast the incoming data appropriately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: "Death of Schema-on-Read"
Hello, Thanks for Ted & Paul for clarifying my questions. Sorry for not being clear in my previous post, When I said create view I was under the impression for simple views where we use cast expressions currently to cast them to types. In this case planner can use this information to force the scans to use this as the schema. If the query fails then it fails at the scan and not after inferring the schema by the scanner. I know that views can get complicated with joins and expressions. For schema hinting through views I assume they should be created on single tables with corresponding columns one wants to project from the table. Regarding the same question, today we had a discussion with Aman. Here view can be considered as a "view" of the table with schema in place. We can change some syntax to suite it for specifying schema. something like this. create schema[optional] view(/virtual table ) v1 as (a: int, b : int) select a, b from t1 with some other rules as to conversion of scalar to complex types. Then the queries when used on this view (below) should enable the scanner to use this type information and then use it to convert the data into the appropriate types. select * from v1 For the possibility of schema information not being known by the user, may be use something like this. create schema[optional] view(/virtual table) v1 as select a, b from t1 infer schema. This view when used to query the table should trigger the logic of inferring and consolidating the schema and attaching that inferred schema to the view. In future when we use the same view, we should be using the inferred schema. This view either can be local view pertaining to the session or a global view so that other queries across sessions can use them. By default we can apply certain rules such as converting simple scalar values to other scalar values (like int to double etc). But we should be also able to give option to the customer to enable rules such as scalar int to array[int] when creating the view itself. Thanks, -Hanu On Fri, Apr 6, 2018 at 3:10 PM, Paul Rogerswrote: > Ted, this is why your participation in Drill is such a gift: cast > push-down is an elegant, simple solution that even works in views. > Beautiful. > > Thanks, > - Paul > > > > On Friday, April 6, 2018, 11:35:37 AM PDT, Ted Dunning < > ted.dunn...@gmail.com> wrote: > > On Thu, Apr 5, 2018 at 9:43 PM, Paul Rogers > wrote: > > > Great discussion. Really appreciate the insight from the Drill users! > > > > To Ted's points: the simplest possible solution is to allow a table > > function to express types. Just making stuff up: > > > > SELECT a FROM schema(myTable, (a: INT)) > > > > Why not just allow cast to be pushed down to the reader? > > Why invent new language features? > > Or, really ugly, a session option: > > > > ALTER SESSION SET schema.myTable="a: INT" > > > > These are a big problem. > >
Re: "Death of Schema-on-Read"
Hello, Thank you Paul for starting this discussion. However, I was not clear on the latest point as to how providing hints and creating a view(mechanism which already exists in DRILL) is different. I do think that creating a view can be cumbersome (in terms of syntax). Providing hints are ephemeral and hence it can be used for quick validation of the schema for a query execution. But if the user absolutely knows the schema, then I think creating a view and using it might be a better option. Can you please share your thoughts on this. Thank you Ted for your valuable suggestions, as regards to your comment on "metastore is good but centralized is bad" can you please share your view point on what all design issues it can cause. I know that it can be bottleneck but just want to know about other issues. Put in other terms if centralized metastore engineered in a good way to avoid most of the bottleneck, then do you think it can be good to use for metadata? Thanks, -Hanu On Thu, Apr 5, 2018 at 9:43 PM, Paul Rogerswrote: > Great discussion. Really appreciate the insight from the Drill users! > > To Ted's points: the simplest possible solution is to allow a table > function to express types. Just making stuff up: > > SELECT a FROM schema(myTable, (a: INT)) > > Or, a SQL extension: > > SELECT a FROM myTable(a: INT) > > Or, really ugly, a session option: > > ALTER SESSION SET schema.myTable="a: INT" > > All these are ephemeral and not compatible with, say, Tableau. > > Building on Ted's suggestion of using the (distributed) file system we can > toss out a few half-baked ideas. Maybe use a directory to represent a name > space, with files representing tables. If I have "weblogs" as my directory, > I might have a file called "jsonlog" to describe the (messy) format of my > JSON-formatted log files. And "csvlog" to describe my CSV-format logs. > Different directories represent different SQL databases (schemas), > different files represent tables within the schema. > > > The table files can store column hints. But, it could do more. Maybe > define the partitioning scheme (by year, month, day, say) so that can be > mapped to a column. Wouldn't it be be great if Drill could figure out the > partitioning itself if we gave a date range? > > The file could also define the format plugin to use, and its options, to > avoid the need to define this format separate from the data, and to reduce > the need for table functions. > > Today, Drill matches files to format plugins using only extensions. The > table file could provide a regex for those old-style files (such as real > web logs) that don't use suffixes. Or, to differentiate between "sales.csv" > and "returns.csv" in the same data directory. > > > While we're at it, the file might as well contain a standard view to apply > to the table to define computed columns, do data conversions and so on. > > If Drill does automatic scans (to detect schema, to gather stats), maybe > store that alongside the table file: "csvlogs.drill" for the > Drill-generated info. > > > Voila! A nice schema definition with no formal metastore. Because the info > is in files, it easy to version using git, etc. (especially if the > directory can be mounted using NFS as a normal directory.) Atomic updates > can be done via the rename trick (which, sadly, does not work on S3...) > > > Or, maybe store all information in ZK in JSON as we do for plugin > configurations. (Hard to version and modify though...) > > > Lots of ways to skin this cat once we agree that hints are, in fact, > useful additions to Drill's automatic schema detection. > > > Thanks, > - Paul > > > > On Thursday, April 5, 2018, 3:22:07 PM PDT, Ted Dunning < > ted.dunn...@gmail.com> wrote: > > On Thu, Apr 5, 2018 at 7:24 AM, Joel Pfaff wrote: > > > Hello, > > > > A lot of versioning problems arise when trying to share data through > kafka > > between multiple applications with different lifecycles and maintainers, > > since by default, a single message in Kafka is just a blob. > > One way to solve that is to agree on a single serialization format, > > friendly with a record per record storage (like avro) and in order to not > > have to serialize the schema in use for every message, just reference an > > entry in the Avro Schema Registry (this flow is described here: > > https://medium.com/@stephane.maarek/introduction-to- > > schemas-in-apache-kafka-with-the-confluent-schema-registry-3bf55e401321 > > ). > > On top of the schema registry, specific client libs allow to validate the > > message structure prior to the injection in kafka. > > So while comcast mentions the usage of an Avro Schema to describe its > > feeds, it does not mention directly the usage of avro files (to describe > > the schema). > > > > This is all good except for the assumption of a single schema for all time. > You can mutate schemas in Avro (or JSON) in a future-proof manner, but it > is important to
Re: [DISCUSS] 1.13.0 release
On my machine I couldn't repro the issue related to TestDrillbitResilience. cancelAfterAllResultsProduced. I used the vladimir's branch (i.e DRILL-1491). Used the maven test command for testing it. output of the test run. ... 4 common frames omitted Tests run: 20, Failures: 0, Errors: 0, Skipped: 6, Time elapsed: 124.187 sec - in org.apache.drill.exec.server.TestDrillbitResilience On Wed, Mar 7, 2018 at 11:00 AM, Parth Chandrawrote: > Yes I agree. JDBC would be a new feature that we can defer to 1.14.0. > I'm hoping we can resolve the other three in the next few days. Target date > for starting release process - Friday Mar 9th > > Once these are resolved, I will create a branch for the release so that > Apache master remains open for commits. If any issues are found in the > release branch, we will fix them in master and I will cherry-pick the into > the release branch. Once the release is finalized I will add a release tag > and remove the branch. > > Also note if QA folks want to get started on testing the release, the > current head of Apache master is close to final. Javadoc generation is only > a release build issue, and the other issues are localized to specific > cases. > > Note: to reproduce the javadoc issues: ># set JAVA_HOME to JDK 8 >mvn javadoc:javadoc -Papache-release > > > > On Wed, Mar 7, 2018 at 11:23 PM, Aman Sinha wrote: > > > It seems to me the main blockers are: > > > > 1. DRILL-4547Javadoc fails with Java8 <-- Can we split up the work > > among few people to resolve these ? > > 2. DRILL-6216Metadata mismatch.. <-- Agreement was to revert > > one small piece of code and it appears Sorabh is looking into it > > 3. TestDrillbitResilience.cancelAfterAllResultsProduced <-- need > someone > > to look into this > > > > Regarding the JDBC issues that Parth mentioned, looking at the JIRAs, it > > seems they are not showstoppers...Parth do you agree ? > > > > Since we are close to the finish line for JDK 8, IMO we should try and > see > > if in another day or two we can get over these hurdles. > > > > -Aman > > > > > > > > On Wed, Mar 7, 2018 at 7:17 AM, Pritesh Maker wrote: > > > > > The JDK 8 issues will likely require more time to harden for it to be > > > included in the 1.13 release. My recommendation would be to move ahead > > with > > > the 1.13 release now and address these issues right. > > > > > > Pritesh > > > > > > -Original Message- > > > From: Parth Chandra > > > Sent: March 7, 2018 3:34 AM > > > To: dev > > > Subject: Re: [DISCUSS] 1.13.0 release > > > > > > My mistake Volodymyr. > > > > > > Found some other JDK 8 issues in JIRA not tracked in DRILL-1491 > > > > > > DRILL-4547Javadoc fails with Java8 > > > DRILL-6163Switch Travis To Java 8 > > > > > > The following are tracked in DRILL-1491, but it doesn't look like we're > > > addressing these. Are we? > > > > > > DRILL-4329 13 Unit tests are failing with JDK 8 > > > DRILL-4333DRILL-4329 tests in > > > Drill2489CallsAfterCloseThrowExceptionsTest fail in Java 8 > > > DRILL-5120Upgrade JDBC Driver for new Java 8 methods > > > DRILL-5680BasicPhysicalOpUnitTest can't run in Eclipse with Java > 8 > > > > > > > > > *DRILL-4547 is a showstopper*. The release build (-Papache-release) > fails > > > with far too many Javadoc errors even with doc lint turned off. > > > > > > DRILL-4333, DRILL-4329, DRILL-5120 are JDBC related which is a project > by > > > itself. > > > > > > Note that fixing JDBC related issues and adding the command line option > > to > > > turn doc lint off will likely break Java 7 builds. > > > > > > > > > Folks who voted to get JDK 8 into this release, what is the consensus > on > > > JDBC/Java8 ? > > > Also, any volunteers on helping debug > > > TestDrillbitResilience.cancelAfterAllResultsProduced > > > ? > > > > > > > > > > > > On Wed, Mar 7, 2018 at 3:20 PM, Volodymyr Tkach > > > > wrote: > > > > > > > Addition to my last message: > > > > The link with PR for DRILL-1491 https://urldefense.proofpoint. > > > com/v2/url?u=https-3A__github.com_apache_drill_pull_1143=DwIBaQ= > > > cskdkSMqhcnjZxdQVpwTXg=zySISmkmM4WNViCKijENtQ=oTnKwfjj5hFBosMrq_ > > > WWhazhGeoC2nGSKeMOPxU2_cM=p3uialdRhgnf3XRY22R4SWXGZIq66a > > Pijuy-Ms0J_-4= > > > > on which the we can see TestDrillbitResilience. > > > > cancelAfterAllResultsProduced > > > > failure. > > > > > > > > 2018-03-07 11:45 GMT+02:00 Volodymyr Tkach : > > > > > > > > > *To Parth:* > > > > > The failure can only be seen if run on DRILL-1491 branch, because > it > > > uses > > > > > jdk 1.8 in pom.xml > > > > > > > > > > 1.8 > > > > > 1.8 > > > > > > > > > > 2018-03-07 6:03 GMT+02:00 Sorabh Hamirwasia >: > > > > > > > > > >> Just sent an email on RCA of DRILL-6216 to discuss next steps. > > > > >> > > > > >> > > > > >> Thanks, > >
[jira] [Created] (DRILL-6212) A simple join is recursing too deep in planning and eventually throwing stack overflow.
Hanumath Rao Maduri created DRILL-6212: -- Summary: A simple join is recursing too deep in planning and eventually throwing stack overflow. Key: DRILL-6212 URL: https://issues.apache.org/jira/browse/DRILL-6212 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.12.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Fix For: 1.14.0 Create two views using following statements. {code} create view v1 as select cast(greeting as int) f from dfs.`/home/mapr/data/json/temp.json`; create view v2 as select cast(greeting as int) f from dfs.`/home/mapr/data/json/temp.json`; {code} Executing the following join query produces a stack overflow during the planning phase. {code} select t1.f from dfs.tmp.v1 as t inner join dfs.tmp.v1 as t1 on cast(t.f as int) = cast(t1.f as int) and cast(t.f as int) = 10 and cast(t1.f as int) = 10; {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6159) No need to offset rows if order by is not specified in the query.
Hanumath Rao Maduri created DRILL-6159: -- Summary: No need to offset rows if order by is not specified in the query. Key: DRILL-6159 URL: https://issues.apache.org/jira/browse/DRILL-6159 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.12.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Fix For: Future For the queries which have offset and limit and no order by no need to add the offset to limit during pushdown of the limit. Sql doesn't guarantee order in the output if no order by is specified in the query. It is observed that for the queries with offset and limit and no order by, current optimizer is adding the offset and limit and limiting those many rows. Doing so will not early exit the query. Here is an example for a query. {code} select zz1,zz2,a11 from dfs.tmp.viewtmp limit 10 offset 1000 00-00Screen : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {1.06048844E8 rows, 5.54015404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 787 00-01 Project(zz1=[$0], zz2=[$1], a11=[$2]) : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 rows, 5.53005404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 786 00-02SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 rows, 5.53005404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 785 00-03 Limit(offset=[1000], fetch=[10]) : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {9.4938844E7 rows, 5.42905404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 784 00-04UnionExchange : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {8.4838844E7 rows, 5.02505404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 783 01-01 SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {7.4738844E7 rows, 4.21705404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 782 01-02Limit(fetch=[1010]) : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {6.4638844E7 rows, 4.11605404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 781 01-03 Project(zz1=[$0], zz2=[$2], a11=[$1]) : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 2.3306983E7, cumulative cost = {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 780 01-04HashJoin(condition=[=($0, $2)], joinType=[left]) : rowType = RecordType(ANY ZZ1, ANY A, ANY ZZ2): rowcount = 2.3306983E7, cumulative cost = {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 779 01-06 Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/csvd1, numFiles=3, columns=[`ZZ1`, `A`], files=[maprfs:/tmp/csvd1/Daamulti11random2.csv, maprfs:/tmp/csvd1/Daamulti11random21.csv, maprfs:/tmp/csvd1/Daamulti11random211.csv]]]) : rowType = RecordType(ANY ZZ1, ANY A): rowcount = 2.3306983E7, cumulative cost = {2.3306983E7 rows, 4.6613966E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 776 01-05 BroadcastExchange : rowType = RecordType(ANY ZZ2): rowcount = 2641626.0, cumulative cost = {5283252.0 rows, 2.3774634E7 cpu, 0.0 io, 3.2460300288E10 network, 0.0 memory}, id = 778 02-01Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/csvd2, numFiles=1, columns=[`ZZ2`], files=[maprfs:/tmp/csvd2/D222random2.csv]]]) : rowType = RecordType(ANY ZZ2): rowcount = 2641626.0, cumulative cost = {2641626.0 rows, 2641626.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 777 {code} The limit pushed down is Limit(fetch=[1010]) instead it should be Limit(fetch=[10]) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6158) Create a mux operator for union exchange to enable two phase merging instead of foreman merging all the batches.
Hanumath Rao Maduri created DRILL-6158: -- Summary: Create a mux operator for union exchange to enable two phase merging instead of foreman merging all the batches. Key: DRILL-6158 URL: https://issues.apache.org/jira/browse/DRILL-6158 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.12.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Fix For: Future Consider the following simple query {code} select zz1,zz2,a11 from dfs.tmp.viewtmp limit 10 offset 1000 {code} The following plan is generated for this query {code} 00-00Screen : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {1.06048844E8 rows, 5.54015404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 787 00-01 Project(zz1=[$0], zz2=[$1], a11=[$2]) : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 rows, 5.53005404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 786 00-02SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 rows, 5.53005404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 785 00-03 Limit(offset=[1000], fetch=[10]) : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {9.4938844E7 rows, 5.42905404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 784 00-04UnionExchange : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {8.4838844E7 rows, 5.02505404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 783 01-01 SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {7.4738844E7 rows, 4.21705404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 782 01-02Limit(fetch=[1010]) : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {6.4638844E7 rows, 4.11605404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 781 01-03 Project(zz1=[$0], zz2=[$2], a11=[$1]) : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 2.3306983E7, cumulative cost = {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 780 01-04HashJoin(condition=[=($0, $2)], joinType=[left]) : rowType = RecordType(ANY ZZ1, ANY A, ANY ZZ2): rowcount = 2.3306983E7, cumulative cost = {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 779 01-06 Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/csvd1, numFiles=3, columns=[`ZZ1`, `A`], files=[maprfs:/tmp/csvd1/Daamulti11random2.csv, maprfs:/tmp/csvd1/Daamulti11random21.csv, maprfs:/tmp/csvd1/Daamulti11random211.csv]]]) : rowType = RecordType(ANY ZZ1, ANY A): rowcount = 2.3306983E7, cumulative cost = {2.3306983E7 rows, 4.6613966E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 776 01-05 BroadcastExchange : rowType = RecordType(ANY ZZ2): rowcount = 2641626.0, cumulative cost = {5283252.0 rows, 2.3774634E7 cpu, 0.0 io, 3.2460300288E10 network, 0.0 memory}, id = 778 02-01Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/csvd2, numFiles=1, columns=[`ZZ2`], files=[maprfs:/tmp/csvd2/D222random2.csv]]]) : rowType = RecordType(ANY ZZ2): rowcount = 2641626.0, cumulative cost = {2641626.0 rows, 2641626.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 777 {code} In case of many minor fragments and huge cluster all the minor fragments feeding into unionExchange will be merged only at the foreman. Eventhough unionExchange is not a bottleneck interms of cpu but it creates huge memory pressure in terms of memory. It is observed that due to this mostly on a large cluster with many minor fragments it runs out of memory. In this scenario it is always better to locally merge the minor fragments pertaining to a DRILLBIT and send the single stream to the foreman. This divides the memory consumption to all the drillbits and then reduces the memory pressure at the foreman. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-6148) TestSortSpillWithException is sometimes failing.
[ https://issues.apache.org/jira/browse/DRILL-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanumath Rao Maduri resolved DRILL-6148. Resolution: Fixed > TestSortSpillWithException is sometimes failing. > > > Key: DRILL-6148 > URL: https://issues.apache.org/jira/browse/DRILL-6148 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build Test >Affects Versions: 1.12.0 > Reporter: Hanumath Rao Maduri > Assignee: Hanumath Rao Maduri >Priority: Minor > Fix For: 1.13.0 > > > TestSortSpillWithException#testSpillLeakManaged is sometimes failing. However > for some reason this is being observed only in one of my branch. > TestSpillLeakManaged tests for leak when an exception is thrown during the > spilling of the rows in ExternalSort. In the test failure case it happens > that ExternalSort is able to sort the data with the given memory and not > spill at all. Hence the injection interruption path is not hit at all and > hence no exception is thrown. > The test case should use drill.exec.sort.external.mem_limit to force it to > use as less memory as possible so as to test the case. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6148) TestSortSpillWithException is sometimes failing.
Hanumath Rao Maduri created DRILL-6148: -- Summary: TestSortSpillWithException is sometimes failing. Key: DRILL-6148 URL: https://issues.apache.org/jira/browse/DRILL-6148 Project: Apache Drill Issue Type: Bug Components: Tools, Build Test Affects Versions: 1.12.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Fix For: 1.12.0 TestSortSpillWithException#testSpillLeakManaged is sometimes failing. However for some reason this is being observed only in one of my branch. TestSpillLeakManaged tests for leak when an exception is thrown during the spilling of the rows in ExternalSort. In the test failure case it happens that ExternalSort is able to sort the data with the given memory and not spill at all. Hence the injection interruption path is not hit at all and hence no exception is thrown. The test case should use drill.exec.sort.external.mem_limit to force it to use as less memory as possible so as to test the case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6115) SingleMergeExchange is not scaling up when many minor fragments are allocated for a query.
Hanumath Rao Maduri created DRILL-6115: -- Summary: SingleMergeExchange is not scaling up when many minor fragments are allocated for a query. Key: DRILL-6115 URL: https://issues.apache.org/jira/browse/DRILL-6115 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.12.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Attachments: Enhancing Drill to multiplex ordered merge exchanges.docx SingleMergeExchange is created when a global order is required in the output. The following query produces the SingleMergeExchange. {code:java} 0: jdbc:drill:zk=local> explain plan for select L_LINENUMBER from dfs.`/drill/tables/lineitem` order by L_LINENUMBER; +--+--+ | text | json | +--+--+ | 00-00 Screen 00-01 Project(L_LINENUMBER=[$0]) 00-02 SingleMergeExchange(sort0=[0]) 01-01 SelectionVectorRemover 01-02 Sort(sort0=[$0], dir0=[ASC]) 01-03 HashToRandomExchange(dist0=[[$0]]) 02-01 Scan(table=[[dfs, /drill/tables/lineitem]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///drill/tables/lineitem, condition=null], columns=[`L_LINENUMBER`], maxwidth=15]]) {code} On a 10 node cluster if the table is huge then DRILL can spawn many minor fragments which are all merged on a single node with one merge receiver. Doing so will create lot of memory pressure on the receiver node and also execution bottleneck. To address this issue, merge receiver should be multiphase merge receiver. Ideally for large cluster one can introduce tree merges so that merging can be done parallel. But as a first step I think it is better to use the existing infrastructure for multiplexing operators to generate an OrderedMux so that all the minor fragments pertaining to one DRILLBIT should be merged and the merged data can be sent across to the receiver operator. On a 10 node cluster if each node processes 14 minor fragments. Current version of code merges 140 minor fragments the proposed version has two level merges 1 - 14 merge in each drillbit which is parallel and 10 minorfragments are merged at the receiver node. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-5878) TableNotFound exception is being reported for a wrong storage plugin.
Hanumath Rao Maduri created DRILL-5878: -- Summary: TableNotFound exception is being reported for a wrong storage plugin. Key: DRILL-5878 URL: https://issues.apache.org/jira/browse/DRILL-5878 Project: Apache Drill Issue Type: Bug Components: SQL Parser Affects Versions: 1.11.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Priority: Minor Fix For: 1.12.0 Drill is reporting TableNotFound exception for a wrong storage plugin. Consider the following query where employee.json is queried using cp plugin. {code} 0: jdbc:drill:zk=local> select * from cp.`employee.json` limit 10; +--++-++--+-+---++-++--++---+-+-++ | employee_id | full_name | first_name | last_name | position_id | position_title | store_id | department_id | birth_date | hire_date| salary | supervisor_id | education_level | marital_status | gender | management_role | +--++-++--+-+---++-++--++---+-+-++ | 1| Sheri Nowmer | Sheri | Nowmer | 1| President | 0 | 1 | 1961-08-26 | 1994-12-01 00:00:00.0 | 8.0 | 0 | Graduate Degree | S | F | Senior Management | | 2| Derrick Whelply| Derrick | Whelply| 2| VP Country Manager | 0 | 1 | 1915-07-03 | 1994-12-01 00:00:00.0 | 4.0 | 1 | Graduate Degree | M | M | Senior Management | | 4| Michael Spence | Michael | Spence | 2| VP Country Manager | 0 | 1 | 1969-06-20 | 1998-01-01 00:00:00.0 | 4.0 | 1 | Graduate Degree | S | M | Senior Management | | 5| Maya Gutierrez | Maya| Gutierrez | 2| VP Country Manager | 0 | 1 | 1951-05-10 | 1998-01-01 00:00:00.0 | 35000.0 | 1 | Bachelors Degree | M | F | Senior Management | | 6| Roberta Damstra| Roberta | Damstra| 3| VP Information Systems | 0 | 2 | 1942-10-08 | 1994-12-01 00:00:00.0 | 25000.0 | 1 | Bachelors Degree | M | F | Senior Management | | 7| Rebecca Kanagaki | Rebecca | Kanagaki | 4| VP Human Resources | 0 | 3 | 1949-03-27 | 1994-12-01 00:00:00.0 | 15000.0 | 1 | Bachelors Degree | M | F | Senior Management | | 8| Kim Brunner| Kim | Brunner| 11 | Store Manager | 9 | 11 | 1922-08-10 | 1998-01-01 00:00:00.0 | 1.0 | 5 | Bachelors Degree | S | F | Store Management | | 9| Brenda Blumberg| Brenda | Blumberg | 11 | Store Manager | 21| 11 | 1979-06-23 | 1998-01-01 00:00:00.0 | 17000.0 | 5 | Graduate Degree | M | F | Store Management | | 10 | Darren Stanz | Darren | Stanz | 5| VP Finance | 0 | 5 | 1949-08-26 | 1994-12-01 00:00:00.0 | 5.0 | 1 | Partial College | M | M | Senior Management | | 11 | Jonathan Murraiin | Jonathan| Murraiin | 11 | Store Manager | 1 | 11 | 1967-06-20 | 1998-01-01 00:00:00.0 | 15000.0 | 5 | Graduate Degree | S | M | Store Management | +--++-++--+-+---++-++--++---+-+-++ {code} However if cp1 is used instead of cp then Drill reports TableNotFound exception. {code} 0: jdbc:drill:zk=local> select * from cp1.`employee.json` limit 10; Oct 16, 2017 1:40:02 PM org.apache.calcite.sql.validate.SqlValidatorException SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 'cp1.employee.json' not found Oct 16, 2017 1:40
[jira] [Created] (DRILL-5851) Empty table during a join operation with a non empty table produces cast exception
Hanumath Rao Maduri created DRILL-5851: -- Summary: Empty table during a join operation with a non empty table produces cast exception Key: DRILL-5851 URL: https://issues.apache.org/jira/browse/DRILL-5851 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.11.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Hash Join operation on tables with one table empty and the other non empty throws an exception {code} Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts between 1. Numeric data 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARCHAR, Right type: INT. Add explicit casts to avoid this error {code} Here is an example query with which it is reproducible. {code} select * from cp.`sample-data/nation.parquet` nation left outer join dfs.tmp.`2.csv` as two on two.a = nation.`N_COMMENT`; {code} the contents of 2.csv is empty (i.e not even header info). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Assign a JIRA
Hello All, I would like to work on this JIRA DRILL-5773. Can you please assign this JIRA to me. my user-name : hanu.ncr Thanks, -Hanu