Re: Aggregate queries in drill
Also, I would strongly encourage that you attend the weekly developer hangouts on Tuesdays. https://drill.apache.org/community-resources/ On Mon, Aug 10, 2015 at 10:17 AM, rahul challapalli challapallira...@gmail.com wrote: Sudip, I will take a look when I get some time. I am not sure if you already have testcases for the part of the plugin which is already working, if not it would be very helpful if you add a few of them, so that I can walk through your code using the debugger. - Rahul On Mon, Aug 10, 2015 at 6:31 AM, Sudip Mukherjee smukher...@commvault.com wrote: Hi Rahul, I was trying something below where I am trying to see what is in the sql query but doesn't seem get the aggr functions! https://github.com/sudipmukherjee/drill/blob/master/contrib/storage-solr/src/main/java/org/apache/drill/exec/store/solr/SolrQueryFilterRule.java could you please have a look if you get a chance? example physical plan for a query (select count(*) from solr.`bootstrap_5`; ) [bootstrap_5 is one of the cores I have in my solr engine] 2015-08-10 18:04:04,007 [2a3765c5-0e91-1f6e-5462-b134759bc9b7:foreman] DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - Drill Physical : 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {60.1 rows, 340.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 147 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {60.0 rows, 340.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 146 00-02StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {60.0 rows, 340.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 145 00-03 Project($f0=[0]) : rowType = RecordType(INTEGER $f0): rowcount = 20.0, cumulative cost = {40.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 144 00-04Scan(groupscan=[SolrGroupScan [SolrScanSpec=SolrScanSpec [solrCoreName=bootstrap_5, filter=null], columns=[`*`]]]) : rowType = (DrillRecordRow[*]): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 143 Excerpt of the plan : graph : [ { pop : solr-scan, @id : 4, solrPluginConfig : { type : solr, solrServer : http://localhost:2/solr/;, enabled : true }, solrScanSpec : { solrCoreName : bootstrap_5, filter : null }, columns : [ `*` ], userName : smukherjee, cost : 20.0 }, { pop : project, @id : 3, exprs : [ { ref : `$f0`, expr : 0 } ], child : 4, initialAllocation : 100, maxAllocation : 100, cost : 20.0 }, { pop : streaming-aggregate, @id : 2, child : 3, keys : [ ], exprs : [ { ref : `EXPR$0`, expr : count(1) } ], initialAllocation : 100, maxAllocation : 100, cost : 1.0 } Thanks, Sudip -Original Message- From: rahul challapalli [mailto:challapallira...@gmail.com] Sent: 07 August 2015 PM 01:23 To: dev@drill.apache.org Subject: Re: Aggregate queries in drill Sudip, In your case, I would assume that you would construct something similar to the below : 1. Create your own optimizer rule (SolrPushAggIntoScan). Take a look at PruneScanRule. You should gather the LogicalAggregate and DrillScanRel objects from the RelOptRuleCall. Now from a high level you need to re-create the group scan with the aggregate information. Most likely you might to need to use an expression visitor in your SolrPushAggIntoScan class to figure out what aggregate functions you want to push into the scan 2. Now add your new rule(s) to the StoragePlugin.getOptimizerRules() method. - Rahul On Thu, Aug 6, 2015 at 10:00 PM, Sudip Mukherjee smukher...@commvault.com wrote: Hi , I am trying to make basic storage plugin for solr with drill. Is there a way I could get the aggregate function information via expression visitor in the plugin code so that I can optimize the Solr query as much as I can. For example, for a count query I would just return the numFound from solr response with rows =0. Source code : https://github.com/apache/drill/pull/100 Could someone please help me on this? Thanks, Sudip Mukherjee ***Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you. ** ***Legal Disclaimer*** This communication may contain confidential and privileged material
Re: Aggregate queries in drill
Sudip, I will take a look when I get some time. I am not sure if you already have testcases for the part of the plugin which is already working, if not it would be very helpful if you add a few of them, so that I can walk through your code using the debugger. - Rahul On Mon, Aug 10, 2015 at 6:31 AM, Sudip Mukherjee smukher...@commvault.com wrote: Hi Rahul, I was trying something below where I am trying to see what is in the sql query but doesn't seem get the aggr functions! https://github.com/sudipmukherjee/drill/blob/master/contrib/storage-solr/src/main/java/org/apache/drill/exec/store/solr/SolrQueryFilterRule.java could you please have a look if you get a chance? example physical plan for a query (select count(*) from solr.`bootstrap_5`; ) [bootstrap_5 is one of the cores I have in my solr engine] 2015-08-10 18:04:04,007 [2a3765c5-0e91-1f6e-5462-b134759bc9b7:foreman] DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - Drill Physical : 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {60.1 rows, 340.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 147 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {60.0 rows, 340.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 146 00-02StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {60.0 rows, 340.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 145 00-03 Project($f0=[0]) : rowType = RecordType(INTEGER $f0): rowcount = 20.0, cumulative cost = {40.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 144 00-04Scan(groupscan=[SolrGroupScan [SolrScanSpec=SolrScanSpec [solrCoreName=bootstrap_5, filter=null], columns=[`*`]]]) : rowType = (DrillRecordRow[*]): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 143 Excerpt of the plan : graph : [ { pop : solr-scan, @id : 4, solrPluginConfig : { type : solr, solrServer : http://localhost:2/solr/;, enabled : true }, solrScanSpec : { solrCoreName : bootstrap_5, filter : null }, columns : [ `*` ], userName : smukherjee, cost : 20.0 }, { pop : project, @id : 3, exprs : [ { ref : `$f0`, expr : 0 } ], child : 4, initialAllocation : 100, maxAllocation : 100, cost : 20.0 }, { pop : streaming-aggregate, @id : 2, child : 3, keys : [ ], exprs : [ { ref : `EXPR$0`, expr : count(1) } ], initialAllocation : 100, maxAllocation : 100, cost : 1.0 } Thanks, Sudip -Original Message- From: rahul challapalli [mailto:challapallira...@gmail.com] Sent: 07 August 2015 PM 01:23 To: dev@drill.apache.org Subject: Re: Aggregate queries in drill Sudip, In your case, I would assume that you would construct something similar to the below : 1. Create your own optimizer rule (SolrPushAggIntoScan). Take a look at PruneScanRule. You should gather the LogicalAggregate and DrillScanRel objects from the RelOptRuleCall. Now from a high level you need to re-create the group scan with the aggregate information. Most likely you might to need to use an expression visitor in your SolrPushAggIntoScan class to figure out what aggregate functions you want to push into the scan 2. Now add your new rule(s) to the StoragePlugin.getOptimizerRules() method. - Rahul On Thu, Aug 6, 2015 at 10:00 PM, Sudip Mukherjee smukher...@commvault.com wrote: Hi , I am trying to make basic storage plugin for solr with drill. Is there a way I could get the aggregate function information via expression visitor in the plugin code so that I can optimize the Solr query as much as I can. For example, for a count query I would just return the numFound from solr response with rows =0. Source code : https://github.com/apache/drill/pull/100 Could someone please help me on this? Thanks, Sudip Mukherjee ***Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you. ** ***Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you. **
RE: Aggregate queries in drill
Hi Rahul, I was trying something below where I am trying to see what is in the sql query but doesn't seem get the aggr functions! https://github.com/sudipmukherjee/drill/blob/master/contrib/storage-solr/src/main/java/org/apache/drill/exec/store/solr/SolrQueryFilterRule.java could you please have a look if you get a chance? example physical plan for a query (select count(*) from solr.`bootstrap_5`; ) [bootstrap_5 is one of the cores I have in my solr engine] 2015-08-10 18:04:04,007 [2a3765c5-0e91-1f6e-5462-b134759bc9b7:foreman] DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - Drill Physical : 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {60.1 rows, 340.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 147 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {60.0 rows, 340.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 146 00-02StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {60.0 rows, 340.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 145 00-03 Project($f0=[0]) : rowType = RecordType(INTEGER $f0): rowcount = 20.0, cumulative cost = {40.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 144 00-04Scan(groupscan=[SolrGroupScan [SolrScanSpec=SolrScanSpec [solrCoreName=bootstrap_5, filter=null], columns=[`*`]]]) : rowType = (DrillRecordRow[*]): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 143 Excerpt of the plan : graph : [ { pop : solr-scan, @id : 4, solrPluginConfig : { type : solr, solrServer : http://localhost:2/solr/;, enabled : true }, solrScanSpec : { solrCoreName : bootstrap_5, filter : null }, columns : [ `*` ], userName : smukherjee, cost : 20.0 }, { pop : project, @id : 3, exprs : [ { ref : `$f0`, expr : 0 } ], child : 4, initialAllocation : 100, maxAllocation : 100, cost : 20.0 }, { pop : streaming-aggregate, @id : 2, child : 3, keys : [ ], exprs : [ { ref : `EXPR$0`, expr : count(1) } ], initialAllocation : 100, maxAllocation : 100, cost : 1.0 } Thanks, Sudip -Original Message- From: rahul challapalli [mailto:challapallira...@gmail.com] Sent: 07 August 2015 PM 01:23 To: dev@drill.apache.org Subject: Re: Aggregate queries in drill Sudip, In your case, I would assume that you would construct something similar to the below : 1. Create your own optimizer rule (SolrPushAggIntoScan). Take a look at PruneScanRule. You should gather the LogicalAggregate and DrillScanRel objects from the RelOptRuleCall. Now from a high level you need to re-create the group scan with the aggregate information. Most likely you might to need to use an expression visitor in your SolrPushAggIntoScan class to figure out what aggregate functions you want to push into the scan 2. Now add your new rule(s) to the StoragePlugin.getOptimizerRules() method. - Rahul On Thu, Aug 6, 2015 at 10:00 PM, Sudip Mukherjee smukher...@commvault.com wrote: Hi , I am trying to make basic storage plugin for solr with drill. Is there a way I could get the aggregate function information via expression visitor in the plugin code so that I can optimize the Solr query as much as I can. For example, for a count query I would just return the numFound from solr response with rows =0. Source code : https://github.com/apache/drill/pull/100 Could someone please help me on this? Thanks, Sudip Mukherjee ***Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you. ** ***Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you. **
Re: Aggregate queries in drill
Sudip, In your case, I would assume that you would construct something similar to the below : 1. Create your own optimizer rule (SolrPushAggIntoScan). Take a look at PruneScanRule. You should gather the LogicalAggregate and DrillScanRel objects from the RelOptRuleCall. Now from a high level you need to re-create the group scan with the aggregate information. Most likely you might to need to use an expression visitor in your SolrPushAggIntoScan class to figure out what aggregate functions you want to push into the scan 2. Now add your new rule(s) to the StoragePlugin.getOptimizerRules() method. - Rahul On Thu, Aug 6, 2015 at 10:00 PM, Sudip Mukherjee smukher...@commvault.com wrote: Hi , I am trying to make basic storage plugin for solr with drill. Is there a way I could get the aggregate function information via expression visitor in the plugin code so that I can optimize the Solr query as much as I can. For example, for a count query I would just return the numFound from solr response with rows =0. Source code : https://github.com/apache/drill/pull/100 Could someone please help me on this? Thanks, Sudip Mukherjee ***Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you. **