Re: ProjectableFilterableTable.scan() and expensive columns

Alexey Roytman Mon, 06 Nov 2017 06:09:17 -0800

Hello, Julian.

The changes that belong to the Calcite itself -- definitely, I willcontribute them. For now I have only javadoc-related in pull request.

I understand that my approach shall be flexible and universal enough,and aligned with Calcite's ideology (that I'm not acquainted with).

So, for now my only idea is to have an interfaceProjectableFilterableTableScanCostEstimator that has a:

@Override public RelOptCost computeScanCost(RelOptPlanner planner,RelMetadataQuery mq, final List<RexNode> list, final int[] projected);

And at some moment during VolcanoPlanner.findBestExp() flow, we need tocheck whether a (table.table instanceofProjectableFilterableTableScanCostEstimator) and if yes, then multiplyby computeScanCost(). Maybe, the right place is somewhere nearBindableTableScan.computeSelfCost(), but I don't remember whether I havean access to table.table there...

Regarding your approach(https://issues.apache.org/jira/browse/CALCITE-1933) ofExpressionComplexity -- I don't get where shall I have an instance ofthis, and how do I push such a beast to ProjectableFilterableTableinstance. I.e. I don't understand the idea of data composition here...


- Alexey.

On 11/01/2017 10:37 PM, Julian Hyde wrote:

Alexey,

Do you intend to contribute your changes back to Calcite? If not, feel free to 
disregard the following (but also downgrade the amount of free advice that you 
expect from this list; we all have other priorities).

If you do intend to contribute back, you should keep in mind that some changes 
will be more acceptable than others. It’s not sufficient that the changes solve 
your problem; your  changes must not cause problems for other people.

For instance, adding a field to SqlTypeName is probably not good because it 
doesn’t allow people to customize the cost. Also, when extending a class or 
interface by adding methods and fields, ask yourself whether it was intended to 
be extended.

The absolute most useful code you can contribute is a test case that covers all 
of your requirements and is simple for everyone else to run. Then we can safely 
refactor your implementation and know that you are still getting what you need.

Are you aware of 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CALCITE-2D1933&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=LhjCnSmzU2hPgGoBHx67hhcdFfWo4OSoQob6udt6lpA&m=wxCJy0-IChQRB53y7qCSONbcgJj24vGDvV1P8DylhMg&s=LBpaGzJP6rHVxjUmLc_MXY_w4P_jkJSIx5QhtdXGuiw&e=
 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CALCITE-2D1933&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=LhjCnSmzU2hPgGoBHx67hhcdFfWo4OSoQob6udt6lpA&m=wxCJy0-IChQRB53y7qCSONbcgJj24vGDvV1P8DylhMg&s=LBpaGzJP6rHVxjUmLc_MXY_w4P_jkJSIx5QhtdXGuiw&e=>
 ?  There I suggested adding a new kind of metadata called ExpressionComplexity. You could write your own handler that takes into 
account SqlTypeName or whatever you like.

Julian

On Oct 31, 2017, at 10:13 AM, Luis Fernando Kauer 
<[email protected]> wrote:

Hi,
I'm also newbie and I'm just trying to help with I'm learned so far.
It's all about the rules.  I strongly recommend taking a look into:
https://urldefense.proofpoint.com/v2/url?u=https-3A__calcite.apache.org_docs_howto.html-23tracing&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=LhjCnSmzU2hPgGoBHx67hhcdFfWo4OSoQob6udt6lpA&m=wxCJy0-IChQRB53y7qCSONbcgJj24vGDvV1P8DylhMg&s=8gp7l0GOuDYTZ5Ur5JDPNADLkOckZM6W53FnSfWbCXk&e=

Specially about setting the logger lever to DEBUG.

By doing this you can check which rules are being fired, the costs and chosen 
plan.
As I mentioned in my last reply, the problem seems to be that 
BindableTableScan.computeSelfCost does not use the information about used 
projects and filters to compute the cost of the BindableTableScan.
Many rules are applied that convert the nodes and at the end the plan with the 
lowest cost is chosen.  Among the generated plans are plans that push the 
projects to the BindableTableScan and others that do not.  The total cost of 
the plan that pushes the projects and the filter is the probably the same or 
higher than the cost of the plan that pushes only the filter and removes the 
projects (through AggregateProjectMergeRule) and the planner ends up choosing 
the latter.
So maybe just by fixing computeSelfCost it solves the problem.
Just for the purpose of testing this, I changed computeSelfCost to:
@Override public RelOptCost computeSelfCost(RelOptPlanner planner,
         RelMetadataQuery mq) {
       return super.computeSelfCost(planner, mq).multiplyBy((0.01 + 
projects.size())/identity().size());
     }
And only by doing this, the planner chose a plan that pushes the projects and 
filters to BindableTableScan when used with aggregation.
You can also override getStatistic() for your implementation of 
ProjectableFilterableTable to provide some statistics to be used in computing 
costs.
Calcite also allows other ways to provide information and statistics but that's 
beyond my knowledge.
Regards,
Luis Fernando


    Em terça-feira, 31 de outubro de 2017 13:38:05 BRST, Alexey Roytman 
<[email protected]> escreveu:

Sorry for the delay, Luis Fernando.

Please see below, as there are a number of answers.


On 10/26/2017 09:37 PM, Luis Fernando Kauer wrote:

   I'm sorry, but I have no idea what you are talking about.
Cassandra Adapter has code to translate the plan to run the query in Cassandra 
Server.
If you are only interested in querying CSV files I don't see how copying that 
code without understanding it will help you.

[Alexey]I need neither Cassandra nor CSV adapter. Cassandra was
mentioned by Julian, so I started investigated it. The CVS files were
used because this way I can create a working test to share with
community to ask questions.

First of all, you need to decide whether you will use 
ProjectableFilterableTable or TranslatableTable.

[Alexey] I would like to use ProjectableFilterableTable, but it does a
poor work for me: starting from a certain level of nesting, it wants all
projections. And for my task it's a too heavy query: my numeric columns
are remotely calculated, with different (and unpredictable) amount of
work for each column.

[Alexey] So, as Julian mentioned Cassandra's interface, I started with
it. There was as complication: it has both Translatable and both
Queriable interfaces. For Queriable interfaces, the RexNode are
translated to List<String> (all these translateBinary2() functions etc)
and passed via reflection call to the query. But for me, at the query
level, I need the RexNodes themselves, and I don't want to parse these
List<String> back!

[Alexey] So, I've started again, but with the Druid adapter, which uses
only Translatable interface.

You must try to understand how the rules work and how to check which rules are 
being fired and which ones are being chosen.

[Alexey] I do try. And when I reach certain understanding, then
obviously I won't ask such newbie questions that I ask for now :-))

Did you follow the tutorial for creating CSV Adapter? It creates a rule to push 
the used projects to the table scan. That is a great start.

[Alexey] I did. But even in flavor=translatable it is very simplistic,
and did not do the job of having both filters and projections at the
lowest level...

It's a good idea to take a look at the built in rules available in Calcite too.
You should take a look into FilterTableScanRule and ProjectTableScanRule, which 
are the rules that push the projects and filters used with 
ProjectableFilterableTable into a BindableTableScan, and the other rules int 
Bindables.java.

[Alexey] I totally agree with you. But when I look at the code there, I
understand even less than now. This will improve with time, but for now
this is what I have...

The rules work fine when there is no aggregate function, pushing both filter 
and projects into BindableTableScan.  The problem seems to be with 
AggregateProjectMergeRule which removes the Project from the plan.
If you remove the filter from your test cases you'll see that the projects are 
pushed to the BindableTableScan.
I was able to simulate your problem using ScannableTableTest.testProjectableFilterable2WithProject changing the query into "select \"k\", 
count(*) from (select \"k\",\"j\" from \"s\".\"beatles\" where \"i\" = 4) x group by \"k\"".
The plan:
LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
     LogicalProject(k=[$1])
       LogicalFilter(condition=[=($0, 4)])
         LogicalProject(i=[$0], k=[$2])
           LogicalTableScan(table=[[s, beatles]])

PhysicalPlan:
EnumerableAggregate(group=[{2}], EXPR$1=[COUNT()]): rowcount = 10.0, cumulative 
cost = {61.25 rows, 50.0 cpu, 0.0 io}, id = 112
     EnumerableInterpreter: rowcount = 100.0, cumulative cost = {50.0 rows, 
50.0 cpu, 0.0 io}, id = 110
       BindableTableScan(table=[[s, beatles]], filters=[[=($0, 4)]]): rowcount 
= 100.0, cumulative cost = {1.0 rows, 1.01 cpu, 0.0 io}, id = 62


If I disable AggregateProjectMergeRule, the physical plan is:
EnumerableAggregate(group=[{0}], EXPR$1=[COUNT()]): rowcount = 10.0, cumulative 
cost = {61.25 rows, 50.0 cpu, 0.0 io}, id = 102
     EnumerableInterpreter: rowcount = 100.0, cumulative cost = {50.0 rows, 
50.0 cpu, 0.0 io}, id = 100
       BindableTableScan(table=[[s, beatles]], filters=[[=($0, 4)]], 
projects=[[2]]): rowcount = 100.0, cumulative cost = {1.0 rows, 1.01 cpu, 0.0 
io}, id = 78

[Alexey] Well, but this does not advances me in the direction of fixing
ProjectableFilterable loosing projections...

Thank you very much for your hints and patience!

- Alexey.


Regards,

Luis Fernando



       Em quinta-feira, 26 de outubro de 2017 13:19:46 BRST, Alexey Roytman 
<[email protected]> escreveu:

Thanks for the hints.


I've tried to use [i.e. copy-pasted a lot of] Cassandra*.java for my
CSV-files example. It's really too wordy! So lot of code I need to
understand later!..

But what bothers me most for now is the fact that I just cannot pass
List<RexNode> to [my modification of] CassandraTable.query(); I need to
convert it to some string form within List<String> using
CassandraFilter.Translator, and then, when passed to [my modification
of] CassandraTable.query(), I need to parse these List<String> back...
Is there way to eliminate this back-and-forth serialization-deserialization?

- Alexey.

(P.S. Sorry for not keeping the email thread for now...)

Julian Hyde wrote wrote:

By "write a rule" I mean write a class that extends RelOptRule. An
example is CassandraRules.CassandraFilterRule.
ProjectableFilterableTable was "only" designed for the case that
occurs 80% of the time but requires 20% of the functionality. Rules
run in a richer environment so have more power and flexibility.

Re: ProjectableFilterableTable.scan() and expensive columns

Reply via email to