Re: Optimizing SUM(1) query

Jacques Nadeau Fri, 18 Mar 2016 22:52:43 -0700

I don't think Julian is saying it does this, I think he is saying it
should. I agree. (This actually is very common Tableau query pattern among
other things.)


Sudip, do you want to open an enhancement JIRA where we rewrite SUM(1) to
COUNT(1). Then our existing count optimizations can take over.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Wed, Mar 16, 2016 at 8:38 AM, Sudip Mukherjee <[email protected]>
wrote:

> I don't see DRILL is transforming the query. Tried with a CSV file.
> Please let me know if I am missing something.
>
> 00-00    Screen : rowType = RecordType(INTEGER EXPR$0): rowcount = 1.0,
> cumulative cost = {3.1 rows, 17.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, id
> = 260
> 00-01      Project(EXPR$0=[$0]) : rowType = RecordType(INTEGER EXPR$0):
> rowcount = 1.0, cumulative cost = {3.0 rows, 17.0 cpu, 0.0 io, 0.0 network,
> 0.0 memory}, id = 259
> 00-02        StreamAgg(group=[{}], EXPR$0=[SUM($0)]) : rowType =
> RecordType(INTEGER EXPR$0): rowcount = 1.0, cumulative cost = {3.0 rows,
> 17.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 258
> 00-03          Project($f0=[1]) : rowType = RecordType(INTEGER $f0):
> rowcount = 1.0, cumulative cost = {2.0 rows, 5.0 cpu, 0.0 io, 0.0 network,
> 0.0 memory}, id = 257
> 00-04            Scan(groupscan=[EasyGroupScan
> [selectionRoot=file:/C:/data/company.csv, numFiles=1, columns=[`*`],
> files=[file:/C:/data/company.csv]]]) : rowType = RecordType(): rowcount =
> 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0
> memory}, id = 256
>
> Thanks,
> Sudip
>
> -----Original Message-----
> From: Julian Hyde [mailto:[email protected]]
> Sent: 16 March 2016 AM 12:50
> To: [email protected]
> Subject: Re: Optimizing SUM(1) query
>
> Is there any reason why Drill cannot transform SUM(1) to COUNT(*) at an
> early stage (i.e. using a logical optimization rule) so that this
> optimization does not need to be done for each engine?
>
> > On Mar 15, 2016, at 5:29 AM, Sudip Mukherjee <[email protected]>
> wrote:
> >
> > I was trying to have an Optimizer rule for the solr storage plugin that
> I'm working on for this query. Trying to use SOLR field stats for this , so
> that the query is faster..
> > Getting the below exception while transforming project to scan. Could
> you please advise?
> >
> >
> > 2016-03-15 08:20:35,149 [291801ee-33fc-064d-7aff-18391f15ae0e:foreman]
> DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - Drill Logical :
> > DrillScreenRel: rowcount = 1.0, cumulative cost = {60.1 rows, 320.1
> > cpu, 0.0 io, 0.0 network, 176.0 memory}, id = 49
> >  DrillProjectRel(EXPR$0=[$0]): rowcount = 1.0, cumulative cost = {60.0
> rows, 320.0 cpu, 0.0 io, 0.0 network, 176.0 memory}, id = 48
> >    DrillAggregateRel(group=[{}], EXPR$0=[SUM($0)]): rowcount = 1.0,
> cumulative cost = {60.0 rows, 320.0 cpu, 0.0 io, 0.0 network, 176.0
> memory}, id = 46
> >      DrillProjectRel($f0=[1]): rowcount = 20.0, cumulative cost = {40.0
> rows, 80.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 44
> >        DrillScanRel(table=[[solr, ANalert_494]],
> > groupscan=[SolrGroupScan [SolrScanSpec=SolrScanSpec
> > [solrCoreName=ANalert_494, solrUrl=http://localhost:20000/solr/
> > filter=[], solrDocFetchCount=-1, aggreegation=[]], columns=[`*`]]]):
> > rowcount = 20.0, cumulative cost = {20.0 rows, 0.0 cpu, 0.0 io, 0.0
> > network, 0.0 memory}, id = 26
> >
> > 2016-03-15 08:20:35,201 [291801ee-33fc-064d-7aff-18391f15ae0e:foreman]
> > DEBUG o.a.drill.exec.work.foreman.Foreman -
> > 291801ee-33fc-064d-7aff-18391f15ae0e: State change requested PENDING
> > --> FAILED
> > org.apache.drill.exec.work.foreman.ForemanException: Unexpected
> exception during fragment initialization: index (0) must be less than size
> (0)
> >       at
> org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255)
> [drill-java-exec.jar:1.4.0]
> >       at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source) [na:1.8.0_65]
> >       at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source) [na:1.8.0_65]
> >       at java.lang.Thread.run(Unknown Source) [na:1.8.0_65] Caused by:
> > java.lang.IndexOutOfBoundsException: index (0) must be less than size (0)
> >       at
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305)
> ~[com.google.guava-guava.jar:na]
> >       at
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284)
> ~[com.google.guava-guava.jar:na]
> >       at
> com.google.common.collect.EmptyImmutableList.get(EmptyImmutableList.java:80)
> ~[com.google.guava-guava.jar:na]
> >       at org.apache.calcite.util.Pair$6.get(Pair.java:335)
> ~[org.apache.calcite-calcite-core.jar:1.4.0-drill-r10]
> >       at
> org.apache.drill.exec.planner.StarColumnHelper.containsStarColumnInProject(StarColumnHelper.java:60)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitProject(StarColumnConverter.java:138)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitProject(StarColumnConverter.java:45)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.ProjectPrel.accept(ProjectPrel.java:69)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitScreen(StarColumnConverter.java:77)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitScreen(StarColumnConverter.java:45)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.ScreenPrel.accept(ScreenPrel.java:64)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.insertRenameProject(StarColumnConverter.java:72)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:326)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:175)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorke
> > r.java:197) ~[drill-java-exec.jar:1.4.0]
> >
> >
> > Thanks,
> > Sudip
> >
> > -----Original Message-----
> > From: Sudip Mukherjee [mailto:[email protected]]
> > Sent: 07 March 2016 PM 04:45
> > To: [email protected]
> > Subject: RE: Optimizing SUM(1) query
> >
> > Hi,
> >
> > Extremely sorry for the delayed response!
> >
> > Basically , I am experimenting an Apache Solr plugin for Drill and
> > tried to use Tableau. On intial selection of a "Table" , I see these
> > queries from drill webui profile
> >
> > SELECT * FROM (SELECT * FROM `solr`.`CSVStringData`) T LIMIT 0
> >
> > SELECT * FROM (SELECT SUM(1) AS `COL` FROM `solr`.`CSVStringData`
> > `CSVStringData` HAVING COUNT(1)>0) T LIMIT 0
> >
> > I have a dataset of 2,297,451,8 and when I try access that,the tableau
> screen in kind of stuck in "Processing Request" for 5 minutes.
> >
> > Thanks,
> > Sudip
> >
> > -----Original Message-----
> > From: Andries Engelbrecht [mailto:[email protected]]
> > Sent: 20 February 2016 AM 03:42
> > To: [email protected]
> > Subject: Re: Optimizing SUM(1) query
> >
> > What are you trying to do in Tableau that generates these queries?
> >
> > Do you have the actual full query that is being generated?
> >
> > Tableau has a number of customization features, wonder if it is possible
> to optimize those settings to avoid this issue.
> >
> > --Andries
> >
> >
> >> On Feb 19, 2016, at 10:16 AM, Sudip Mukherjee <[email protected]>
> wrote:
> >>
> >> Hi,
> >>
> >> Have anyone tried optimizing SUM(1) query in drill? Or is it
> implemented?
> >> Getting these query while using Tableau. Mostly probably it is trying
> to figure out NUMBER_OF_RECORDS.
> >>
> >> Thanks,
> >> Sudip
> >>
> >>
> >>
> >> ***************************Legal
> >> Disclaimer***************************
> >> "This communication may contain confidential and privileged material
> >> for the sole use of the intended recipient. Any unauthorized review,
> >> use or distribution by others is strictly prohibited. If you have
> >> received the message by mistake, please advise the sender by reply
> email and delete the message. Thank you."
> >> *********************************************************************
> >> *
> >
> >
> >
> >
> >
> > ***************************Legal Disclaimer***************************
> > "This communication may contain confidential and privileged material for
> the sole use of the intended recipient. Any unauthorized review, use or
> distribution by others is strictly prohibited. If you have received the
> message by mistake, please advise the sender by reply email and delete the
> message. Thank you."
> > **********************************************************************
> >
> >
> >
> >
> > ***************************Legal Disclaimer***************************
> > "This communication may contain confidential and privileged material
> > for the sole use of the intended recipient. Any unauthorized review,
> > use or distribution by others is strictly prohibited. If you have
> > received the message by mistake, please advise the sender by reply email
> and delete the message. Thank you."
> > **********************************************************************
>
>
>
>
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for
> the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************
>

Re: Optimizing SUM(1) query

Reply via email to