Re: Aggregate queries in drill

2015-08-10 Thread rahul challapalli
Also, I would strongly encourage that you attend the weekly developer
hangouts on Tuesdays.

https://drill.apache.org/community-resources/

On Mon, Aug 10, 2015 at 10:17 AM, rahul challapalli 
challapallira...@gmail.com wrote:

 Sudip,

 I will take a look when I get some time. I am not sure if you already have
 testcases for the part of the plugin which is already working, if not it
 would be very helpful if you add a few of them, so that I can walk through
 your code using the debugger.

 - Rahul

 On Mon, Aug 10, 2015 at 6:31 AM, Sudip Mukherjee smukher...@commvault.com
  wrote:

 Hi Rahul,

 I was trying something below where I am trying to see what is in the sql
 query but doesn't seem get the aggr functions!

 https://github.com/sudipmukherjee/drill/blob/master/contrib/storage-solr/src/main/java/org/apache/drill/exec/store/solr/SolrQueryFilterRule.java
 could you please have a look if you get a chance?

 example physical plan for a query (select count(*) from
 solr.`bootstrap_5`; ) [bootstrap_5 is one of the cores I have in my solr
 engine]

 2015-08-10 18:04:04,007 [2a3765c5-0e91-1f6e-5462-b134759bc9b7:foreman]
 DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - Drill Physical :
 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0,
 cumulative cost = {60.1 rows, 340.1 cpu, 0.0 io, 0.0 network, 0.0 memory},
 id = 147
 00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0):
 rowcount = 1.0, cumulative cost = {60.0 rows, 340.0 cpu, 0.0 io, 0.0
 network, 0.0 memory}, id = 146
 00-02StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType =
 RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {60.0 rows,
 340.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 145
 00-03  Project($f0=[0]) : rowType = RecordType(INTEGER $f0):
 rowcount = 20.0, cumulative cost = {40.0 rows, 100.0 cpu, 0.0 io, 0.0
 network, 0.0 memory}, id = 144
 00-04Scan(groupscan=[SolrGroupScan [SolrScanSpec=SolrScanSpec
 [solrCoreName=bootstrap_5, filter=null], columns=[`*`]]]) : rowType =
 (DrillRecordRow[*]): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0
 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 143

 Excerpt of the plan :

 graph : [ {
 pop : solr-scan,
 @id : 4,
 solrPluginConfig : {
   type : solr,
   solrServer : http://localhost:2/solr/;,
   enabled : true
 },
 solrScanSpec : {
   solrCoreName : bootstrap_5,
   filter : null
 },
 columns : [ `*` ],
 userName : smukherjee,
 cost : 20.0
   }, {
 pop : project,
 @id : 3,
 exprs : [ {
   ref : `$f0`,
   expr : 0
 } ],
 child : 4,
 initialAllocation : 100,
 maxAllocation : 100,
 cost : 20.0
   }, {
 pop : streaming-aggregate,
 @id : 2,
 child : 3,
 keys : [ ],
 exprs : [ {
   ref : `EXPR$0`,
   expr : count(1) 
 } ],
 initialAllocation : 100,
 maxAllocation : 100,
 cost : 1.0
   }

 Thanks,
 Sudip
 -Original Message-
 From: rahul challapalli [mailto:challapallira...@gmail.com]
 Sent: 07 August 2015 PM 01:23
 To: dev@drill.apache.org
 Subject: Re: Aggregate queries in drill

 Sudip,

 In your case, I would assume that you would construct something similar
 to the below :

 1. Create your own optimizer rule (SolrPushAggIntoScan). Take a look
 at PruneScanRule. You should gather the LogicalAggregate and DrillScanRel
 objects from the RelOptRuleCall. Now from a high level you need to
 re-create the group scan with the aggregate information. Most likely you
 might to need to use an expression visitor in your SolrPushAggIntoScan
 class to figure out what aggregate functions you want to push into the scan
 2. Now add your new rule(s) to the StoragePlugin.getOptimizerRules()
 method.

 - Rahul


 On Thu, Aug 6, 2015 at 10:00 PM, Sudip Mukherjee 
 smukher...@commvault.com
 wrote:

  Hi ,
 
  I am trying to make basic storage plugin for solr with drill. Is there
  a way I could get the aggregate function information via expression
  visitor in the plugin code so that I can optimize the Solr query as
 much as I can.
  For example, for a count query I would just return the numFound from
  solr response with rows =0.
  Source code : https://github.com/apache/drill/pull/100
 
  Could someone please help me on this?
 
  Thanks,
  Sudip Mukherjee
 
 
 
 
  ***Legal Disclaimer***
  This communication may contain confidential and privileged material
  for the sole use of the intended recipient. Any unauthorized review,
  use or distribution by others is strictly prohibited. If you have
  received the message by mistake, please advise the sender by reply
  email and delete the message. Thank you.
  **



 ***Legal Disclaimer***
 This communication may contain confidential and privileged material

Re: Aggregate queries in drill

2015-08-10 Thread rahul challapalli
Sudip,

I will take a look when I get some time. I am not sure if you already have
testcases for the part of the plugin which is already working, if not it
would be very helpful if you add a few of them, so that I can walk through
your code using the debugger.

- Rahul

On Mon, Aug 10, 2015 at 6:31 AM, Sudip Mukherjee smukher...@commvault.com
wrote:

 Hi Rahul,

 I was trying something below where I am trying to see what is in the sql
 query but doesn't seem get the aggr functions!

 https://github.com/sudipmukherjee/drill/blob/master/contrib/storage-solr/src/main/java/org/apache/drill/exec/store/solr/SolrQueryFilterRule.java
 could you please have a look if you get a chance?

 example physical plan for a query (select count(*) from
 solr.`bootstrap_5`; ) [bootstrap_5 is one of the cores I have in my solr
 engine]

 2015-08-10 18:04:04,007 [2a3765c5-0e91-1f6e-5462-b134759bc9b7:foreman]
 DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - Drill Physical :
 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0,
 cumulative cost = {60.1 rows, 340.1 cpu, 0.0 io, 0.0 network, 0.0 memory},
 id = 147
 00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0):
 rowcount = 1.0, cumulative cost = {60.0 rows, 340.0 cpu, 0.0 io, 0.0
 network, 0.0 memory}, id = 146
 00-02StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType =
 RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {60.0 rows,
 340.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 145
 00-03  Project($f0=[0]) : rowType = RecordType(INTEGER $f0):
 rowcount = 20.0, cumulative cost = {40.0 rows, 100.0 cpu, 0.0 io, 0.0
 network, 0.0 memory}, id = 144
 00-04Scan(groupscan=[SolrGroupScan [SolrScanSpec=SolrScanSpec
 [solrCoreName=bootstrap_5, filter=null], columns=[`*`]]]) : rowType =
 (DrillRecordRow[*]): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0
 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 143

 Excerpt of the plan :

 graph : [ {
 pop : solr-scan,
 @id : 4,
 solrPluginConfig : {
   type : solr,
   solrServer : http://localhost:2/solr/;,
   enabled : true
 },
 solrScanSpec : {
   solrCoreName : bootstrap_5,
   filter : null
 },
 columns : [ `*` ],
 userName : smukherjee,
 cost : 20.0
   }, {
 pop : project,
 @id : 3,
 exprs : [ {
   ref : `$f0`,
   expr : 0
 } ],
 child : 4,
 initialAllocation : 100,
 maxAllocation : 100,
 cost : 20.0
   }, {
 pop : streaming-aggregate,
 @id : 2,
 child : 3,
 keys : [ ],
 exprs : [ {
   ref : `EXPR$0`,
   expr : count(1) 
 } ],
 initialAllocation : 100,
 maxAllocation : 100,
 cost : 1.0
   }

 Thanks,
 Sudip
 -Original Message-
 From: rahul challapalli [mailto:challapallira...@gmail.com]
 Sent: 07 August 2015 PM 01:23
 To: dev@drill.apache.org
 Subject: Re: Aggregate queries in drill

 Sudip,

 In your case, I would assume that you would construct something similar to
 the below :

 1. Create your own optimizer rule (SolrPushAggIntoScan). Take a look
 at PruneScanRule. You should gather the LogicalAggregate and DrillScanRel
 objects from the RelOptRuleCall. Now from a high level you need to
 re-create the group scan with the aggregate information. Most likely you
 might to need to use an expression visitor in your SolrPushAggIntoScan
 class to figure out what aggregate functions you want to push into the scan
 2. Now add your new rule(s) to the StoragePlugin.getOptimizerRules()
 method.

 - Rahul


 On Thu, Aug 6, 2015 at 10:00 PM, Sudip Mukherjee smukher...@commvault.com
 
 wrote:

  Hi ,
 
  I am trying to make basic storage plugin for solr with drill. Is there
  a way I could get the aggregate function information via expression
  visitor in the plugin code so that I can optimize the Solr query as much
 as I can.
  For example, for a count query I would just return the numFound from
  solr response with rows =0.
  Source code : https://github.com/apache/drill/pull/100
 
  Could someone please help me on this?
 
  Thanks,
  Sudip Mukherjee
 
 
 
 
  ***Legal Disclaimer***
  This communication may contain confidential and privileged material
  for the sole use of the intended recipient. Any unauthorized review,
  use or distribution by others is strictly prohibited. If you have
  received the message by mistake, please advise the sender by reply
  email and delete the message. Thank you.
  **



 ***Legal Disclaimer***
 This communication may contain confidential and privileged material for
 the
 sole use of the intended recipient. Any unauthorized review, use or
 distribution
 by others is strictly prohibited. If you have received the message by
 mistake,
 please advise the sender by reply email and delete the message. Thank you.
 **



RE: Aggregate queries in drill

2015-08-10 Thread Sudip Mukherjee
Hi Rahul,

I was trying something below where I am trying to see what is in the sql query 
but doesn't seem get the aggr functions!
https://github.com/sudipmukherjee/drill/blob/master/contrib/storage-solr/src/main/java/org/apache/drill/exec/store/solr/SolrQueryFilterRule.java
could you please have a look if you get a chance?

example physical plan for a query (select count(*) from solr.`bootstrap_5`; ) 
[bootstrap_5 is one of the cores I have in my solr engine]

2015-08-10 18:04:04,007 [2a3765c5-0e91-1f6e-5462-b134759bc9b7:foreman] DEBUG 
o.a.d.e.p.s.h.DefaultSqlHandler - Drill Physical : 
00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, 
cumulative cost = {60.1 rows, 340.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
147
00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount 
= 1.0, cumulative cost = {60.0 rows, 340.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 146
00-02StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = 
RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {60.0 rows, 340.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 145
00-03  Project($f0=[0]) : rowType = RecordType(INTEGER $f0): rowcount = 
20.0, cumulative cost = {40.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 144
00-04Scan(groupscan=[SolrGroupScan [SolrScanSpec=SolrScanSpec 
[solrCoreName=bootstrap_5, filter=null], columns=[`*`]]]) : rowType = 
(DrillRecordRow[*]): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 
0.0 io, 0.0 network, 0.0 memory}, id = 143

Excerpt of the plan :

graph : [ {
pop : solr-scan,
@id : 4,
solrPluginConfig : {
  type : solr,
  solrServer : http://localhost:2/solr/;,
  enabled : true
},
solrScanSpec : {
  solrCoreName : bootstrap_5,
  filter : null
},
columns : [ `*` ],
userName : smukherjee,
cost : 20.0
  }, {
pop : project,
@id : 3,
exprs : [ {
  ref : `$f0`,
  expr : 0
} ],
child : 4,
initialAllocation : 100,
maxAllocation : 100,
cost : 20.0
  }, {
pop : streaming-aggregate,
@id : 2,
child : 3,
keys : [ ],
exprs : [ {
  ref : `EXPR$0`,
  expr : count(1) 
} ],
initialAllocation : 100,
maxAllocation : 100,
cost : 1.0
  }

Thanks,
Sudip
-Original Message-
From: rahul challapalli [mailto:challapallira...@gmail.com] 
Sent: 07 August 2015 PM 01:23
To: dev@drill.apache.org
Subject: Re: Aggregate queries in drill

Sudip,

In your case, I would assume that you would construct something similar to the 
below :

1. Create your own optimizer rule (SolrPushAggIntoScan). Take a look at 
PruneScanRule. You should gather the LogicalAggregate and DrillScanRel objects 
from the RelOptRuleCall. Now from a high level you need to re-create the group 
scan with the aggregate information. Most likely you might to need to use an 
expression visitor in your SolrPushAggIntoScan class to figure out what 
aggregate functions you want to push into the scan
2. Now add your new rule(s) to the StoragePlugin.getOptimizerRules() method.

- Rahul


On Thu, Aug 6, 2015 at 10:00 PM, Sudip Mukherjee smukher...@commvault.com
wrote:

 Hi ,

 I am trying to make basic storage plugin for solr with drill. Is there 
 a way I could get the aggregate function information via expression 
 visitor in the plugin code so that I can optimize the Solr query as much as I 
 can.
 For example, for a count query I would just return the numFound from 
 solr response with rows =0.
 Source code : https://github.com/apache/drill/pull/100

 Could someone please help me on this?

 Thanks,
 Sudip Mukherjee




 ***Legal Disclaimer***
 This communication may contain confidential and privileged material 
 for the sole use of the intended recipient. Any unauthorized review, 
 use or distribution by others is strictly prohibited. If you have 
 received the message by mistake, please advise the sender by reply 
 email and delete the message. Thank you.
 **



***Legal Disclaimer***
This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you.
**

Re: Aggregate queries in drill

2015-08-07 Thread rahul challapalli
Sudip,

In your case, I would assume that you would construct something similar to
the below :

1. Create your own optimizer rule (SolrPushAggIntoScan). Take a look at
PruneScanRule. You should gather the LogicalAggregate and DrillScanRel
objects from the RelOptRuleCall. Now from a high level you need to
re-create the group scan with the aggregate information. Most likely you
might to need to use an expression visitor in your SolrPushAggIntoScan
class to figure out what aggregate functions you want to push into the scan
2. Now add your new rule(s) to the StoragePlugin.getOptimizerRules()
method.

- Rahul


On Thu, Aug 6, 2015 at 10:00 PM, Sudip Mukherjee smukher...@commvault.com
wrote:

 Hi ,

 I am trying to make basic storage plugin for solr with drill. Is there a
 way I could get the aggregate function information via expression visitor
 in the plugin code so that I can optimize the Solr query as much as I can.
 For example, for a count query I would just return the numFound from solr
 response with rows =0.
 Source code : https://github.com/apache/drill/pull/100

 Could someone please help me on this?

 Thanks,
 Sudip Mukherjee




 ***Legal Disclaimer***
 This communication may contain confidential and privileged material for
 the
 sole use of the intended recipient. Any unauthorized review, use or
 distribution
 by others is strictly prohibited. If you have received the message by
 mistake,
 please advise the sender by reply email and delete the message. Thank you.
 **