Re: ProjectableFilterableTable.scan() and expensive columns

Alexey Roytman Tue, 31 Oct 2017 08:32:34 -0700

Sorry for the delay, Luis Fernando.

Please see below, as there are a number of answers.



On 10/26/2017 09:37 PM, Luis Fernando Kauer wrote:

  I'm sorry, but I have no idea what you are talking about.
Cassandra Adapter has code to translate the plan to run the query in Cassandra 
Server.
If you are only interested in querying CSV files I don't see how copying that 
code without understanding it will help you.

[Alexey]I need neither Cassandra nor CSV adapter. Cassandra wasmentioned by Julian, so I started investigated it. The CVS files wereused because this way I can create a working test to share withcommunity to ask questions.

First of all, you need to decide whether you will use 
ProjectableFilterableTable or TranslatableTable.

[Alexey] I would like to use ProjectableFilterableTable, but it does apoor work for me: starting from a certain level of nesting, it wants allprojections. And for my task it's a too heavy query: my numeric columnsare remotely calculated, with different (and unpredictable) amount ofwork for each column.

[Alexey] So, as Julian mentioned Cassandra's interface, I started withit. There was as complication: it has both Translatable and bothQueriable interfaces. For Queriable interfaces, the RexNode aretranslated to List<String> (all these translateBinary2() functions etc)and passed via reflection call to the query. But for me, at the querylevel, I need the RexNodes themselves, and I don't want to parse theseList<String> back!

[Alexey] So, I've started again, but with the Druid adapter, which usesonly Translatable interface.

You must try to understand how the rules work and how to check which rules are 
being fired and which ones are being chosen.

[Alexey] I do try. And when I reach certain understanding, thenobviously I won't ask such newbie questions that I ask for now :-))

Did you follow the tutorial for creating CSV Adapter? It creates a rule to push 
the used projects to the table scan. That is a great start.

[Alexey] I did. But even in flavor=translatable it is very simplistic,and did not do the job of having both filters and projections at thelowest level...

It's a good idea to take a look at the built in rules available in Calcite too.
You should take a look into FilterTableScanRule and ProjectTableScanRule, which 
are the rules that push the projects and filters used with 
ProjectableFilterableTable into a BindableTableScan, and the other rules int 
Bindables.java.

[Alexey] I totally agree with you. But when I look at the code there, Iunderstand even less than now. This will improve with time, but for nowthis is what I have...

The rules work fine when there is no aggregate function, pushing both filter 
and projects into BindableTableScan.  The problem seems to be with 
AggregateProjectMergeRule which removes the Project from the plan.
If you remove the filter from your test cases you'll see that the projects are 
pushed to the BindableTableScan.
I was able to simulate your problem using ScannableTableTest.testProjectableFilterable2WithProject changing the query into "select \"k\", 
count(*) from (select \"k\",\"j\" from \"s\".\"beatles\" where \"i\" = 4) x group by \"k\"".
The plan:
LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
   LogicalProject(k=[$1])
     LogicalFilter(condition=[=($0, 4)])
       LogicalProject(i=[$0], k=[$2])
         LogicalTableScan(table=[[s, beatles]])

PhysicalPlan:
EnumerableAggregate(group=[{2}], EXPR$1=[COUNT()]): rowcount = 10.0, cumulative 
cost = {61.25 rows, 50.0 cpu, 0.0 io}, id = 112
   EnumerableInterpreter: rowcount = 100.0, cumulative cost = {50.0 rows, 50.0 
cpu, 0.0 io}, id = 110
     BindableTableScan(table=[[s, beatles]], filters=[[=($0, 4)]]): rowcount = 
100.0, cumulative cost = {1.0 rows, 1.01 cpu, 0.0 io}, id = 62


If I disable AggregateProjectMergeRule, the physical plan is:
EnumerableAggregate(group=[{0}], EXPR$1=[COUNT()]): rowcount = 10.0, cumulative 
cost = {61.25 rows, 50.0 cpu, 0.0 io}, id = 102
   EnumerableInterpreter: rowcount = 100.0, cumulative cost = {50.0 rows, 50.0 
cpu, 0.0 io}, id = 100
     BindableTableScan(table=[[s, beatles]], filters=[[=($0, 4)]], 
projects=[[2]]): rowcount = 100.0, cumulative cost = {1.0 rows, 1.01 cpu, 0.0 
io}, id = 78

[Alexey] Well, but this does not advances me in the direction of fixingProjectableFilterable loosing projections...


Thank you very much for your hints and patience!

- Alexey.



Regards,

Luis Fernando



     Em quinta-feira, 26 de outubro de 2017 13:19:46 BRST, Alexey Roytman 
<[email protected]> escreveu:

Thanks for the hints.


I've tried to use [i.e. copy-pasted a lot of] Cassandra*.java for my
CSV-files example. It's really too wordy! So lot of code I need to
understand later!..

But what bothers me most for now is the fact that I just cannot pass
List<RexNode> to [my modification of] CassandraTable.query(); I need to
convert it to some string form within List<String> using
CassandraFilter.Translator, and then, when passed to [my modification
of] CassandraTable.query(), I need to parse these List<String> back...
Is there way to eliminate this back-and-forth serialization-deserialization?

- Alexey.

(P.S. Sorry for not keeping the email thread for now...)

Julian Hyde wrote wrote:

By "write a rule" I mean write a class that extends RelOptRule. An
example is CassandraRules.CassandraFilterRule.
ProjectableFilterableTable was "only" designed for the case that
occurs 80% of the time but requires 20% of the functionality. Rules
run in a richer environment so have more power and flexibility.

Re: ProjectableFilterableTable.scan() and expensive columns

Reply via email to