Re: Conversion from Bindable to Enumerable

2022-09-27 Thread Thomas Rebele
Hello Pranav,

here some untested pseudo-code that could give you an idea how to achieve
this. It probably needs several adaptations to make it work (marked with
"<- here", possibly others).

private static final Method BIND = Types.lookupMethod(Bindable.class,
> "bind", Bindable.class, DataContext.class); // <- here
>
> private static class EnumerableBindableRel extends AbstractRelNode
> implements EnumerableRel {
> private final BindableRel bindable;
>
> public EnumerableBindableRel(BindableRel bindable) {
> super(bindable.getCluster(), bindable.getTraitSet());
> this.bindable = bindable;
> }
>
> @Override
> public Result implement(EnumerableRelImplementor implementor, Prefer
> pref) {
> final PhysType physType = PhysTypeImpl.of(
> implementor.getTypeFactory(),
> this.getRowType(),
> JavaRowFormat.ARRAY); // <- here
>
> final BlockBuilder builder = new BlockBuilder();
> MethodCallExpression call = Expressions.call( // <- here, correct
> arguments?
> implementor.stash(this.bindable, Bindable.class),
> BIND,
> implementor.getRootExpression());
>
> ParameterExpression resultVar =
> Expressions.parameter(Enumerable.class, "_result");
> builder.add(Expressions.declare(Modifier.FINAL, resultVar, call));
> builder.add(Expressions.return_(null, resultVar));
> return implementor.result(physType, builder.toBlock());
> }
> }
>
> private static EnumerableRel getEnumerableRel(BindableRel bindable) {
> return new EnumerableBindableRel(bindable);
> }
>

Cordialement / Best Regards,
Dr. Thomas Rebele | R&D Developer | Germany | E treb...@tibco.com | W
www.tibco.com

TIBCO Software GmbH |  St.-Martin-Str. 106, 81669 München, Deutschland |
Registergericht: Amtsgericht München, HRB 123355 | Geschäftsführer: William
Hughes; Alexander E. Kolar

Cordialement / Best Regards,
*Dr. Thomas Rebele* | R&D Developer | Germany | *E* *treb...@tibco.com
* | *W* *www.tibco.com <http://www.tibco.com/>*

*TIBCO Software GmbH* |  St.-Martin-Str. 106, 81669 München, Deutschland |
Registergericht: Amtsgericht München, HRB 123355 | Geschäftsführer: William
Hughes; Alexander E. Kolar



On Fri, Sep 23, 2022 at 10:50 PM Pranav Deshpande <
deshpande.v.pra...@gmail.com> wrote:

> Dear Apache Calcite Team,
> Is there a way to convert a BindableRel into an EnumerableRel?
>
> There is a rule to convert an EnumerableRel into a BindableRel, but not the
> other way around.
>
> I am asking this because I wish to have my TableScans use the
> BindableConvention and the rest of my execution using the Enumerable
> Convention [as it has more operators!].
>
> Regards,
> Pranav
>


Re: Push unparsed SQL select statement into RelBuilder

2023-01-13 Thread Thomas Rebele
If you just need this for rel to sql translation: Maybe you could create a
mock TableScan, without content, that contains a sqlQuery attribute.
However, as Julian stated, you would need to provide a type for the mock
table. Not sure why Julian said the SQL would need to be adapted. Maybe
related to TableFunctionScan?

Cordialement / Best Regards,
*Dr. Thomas Rebele* | R&D Developer | Germany | *E* *treb...@tibco.com
* | *W* *www.tibco.com <http://www.tibco.com/>*

*TIBCO Software GmbH* |  St.-Martin-Str. 106, 81669 München, Deutschland |
Registergericht: Amtsgericht München, HRB 123355 | Geschäftsführer: William
Hughes; Alexander E. Kolar



On Tue, Jan 10, 2023 at 2:00 AM Julian Hyde  wrote:

> You can go two ways with this.
>
> If the SQL string gets parsed (and validated and translated to
> RelNode/RexNode) then I can see that’s convenient for you. It becomes a
> code maintenance problem for RelBuilder because its dependencies just got
> much larger and possibly cyclic.
>
> If the SQL string remains unparsed then you will need to wrap it in a
> special function. One company I am aware of that uses Calcite has a
> function called ‘asIs’ for this purpose. The problem with that function is
> that it needs to deduce the true type of the expression (Calcite is
> strongly typed). Another is that if you rename tables, columns, or table or
> column aliases, or if you just push the function call to a different part
> of the RelNode tree, then you will need to revise the SQL string in order
> to keep it valid.
>
> Julian
>
>
> > On Jan 9, 2023, at 4:51 PM, Heiko Heijenga 
> wrote:
> >
> > I want to use the algebra builder to construct SQL queries, but use an
> > unparsed/unparseable subquery as the root (as just a string). Does that
> > make sense?
> >
> > (obviously this subquery would be in same sql dialect as output of
> RelToSql)
> >
> > On Mon, Jan 9, 2023 at 4:41 PM Julian Hyde  wrote:
> >
> >> What are you trying to achieve?
> >>
> >> If you want Calcite to understand an unparsed SQL expression then you
> >> need a SQL parser. RelBuilder is not a SQL parser. SqlParser is a SQL
> >> parser.
> >>
> >> On Mon, Jan 9, 2023 at 4:37 PM Heiko  wrote:
> >>>
> >>> Say from some legacy system I get some valid select stmt, e.g. "select
> >> a, b
> >>> from foobar" (and known field names/types, how do I push this as an
> >>> unparsed subquery into a relbuilder so I can do something like
> >>>
> >>>RelNode node = builder
> >>>.projectPlus(builder.call(MINUS, builder.field("b"),
> >>> builder.field("a")))
> >>>.build();
> >>>
> >>>Result result = new RelToSqlConverter(dialect).visitRoot(node);
> >>>
> >>>String sqlString = result
> >>>.asSelect()
> >>>.toSqlString(dialect)
> >>>.getSql();
> >>>
> >>> where "sqlString" would then become something like
> >>>
> >>> SELECT `a` - `b` FROM (select a, b from foobar) AS `$f1`
> >>>
> >>> (note; I could of course parse the SELECT statement but I'm not 100%
> >>> convinced all the possible SQL select statements would be parseable by
> >>> Calcite, so when they're not I want to use above approach to treat them
> >>> basically as black-box subqueries)
> >>>
> >>> Thanks,
> >>> -Heiko
> >>
>
>


Re: [Question] Unknown cost calculation/propagation in RelSubsets

2024-01-16 Thread Thomas Rebele
Hello,

The RuleMatchVisualizer uses the planner to get the cost [1], and the
Volcano planner uses the bestCost attribute for RelSubset [2].

The color depends on the steps:
* For intermediate steps, the purple color shows which nodes have been
matched by the rule. Light blue shows added nodes.
* For the final step, the purple and light blue colors show the chosen
nodes of the final plan. Light blue for the RelSubset nodes.

If I remember correctly, the cost for a subset shown at a step should be
the same as the best cost of all children for that particular step.

It would be helpful to share at least the generated files (especially the
.js), to understand what's going on.

[1]
https://github.com/apache/calcite/blob/c4042a34ef054b89cec1c47fefcbc8689bad55be/core/src/main/java/org/apache/calcite/plan/visualizer/RuleMatchVisualizer.java#L300C34-L300C34
[2]
https://github.com/apache/calcite/blob/c4042a34ef054b89cec1c47fefcbc8689bad55be/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L722

Cordialement / Best Regards,
*Dr. Thomas Rebele* | R&D Developer | Germany | *E* *treb...@tibco.com
* | *W* *www.cloud.com <http://www.cloud.com/>*

*Cloud SG Germany GmbH* |  ℅ Citrix Systems GmbH, Erika-Mann-Straße 67-69,
80636 München Deutschland | Registergericht: Amtsgericht München, HRB
123355 | Geschäftsführer: Antonio Gomes, Alexander E. Kolar, Ganesh
Vaidyanathan, Brian Lee Shytle



On Fri, Jan 12, 2024 at 11:35 PM Tony Fiedler <
tony.fied...@mailbox.tu-dresden.de> wrote:

> Dear Calcite devs,
>
> First of all I really appreciate having a mature framework like Calcite.
> Please continue your great work on this project!
>
> My use case is feeding Calcite (v1.35.0) with an SQL query and doing
> some optimizations by providing metadata and selected planner rules. I
> initialize the Volcano planner and convert the logical plan resulting
> from the sql to a physical plan (using bindable convention).
> After the optimization, I convert the physical plan back to sql --
> hoping its execution time is faster (running the query by a PostgreSQL
> server) than the original query.
>
> There are some aspects I don't understand regarding both the cost
> calculation and cost propagation of (Rel) Subsets in the tree-based plan
> representation generated by RuleMatchVisualizer.
>
> AFAIK Subsets don't have any costs [1], so I'm really confused why
> (cumulative) `cpu` is higher in the subset than it is in its child
> elements (BindableJoin and BindableFilter), see [2]. In addition to that
> the cost metric `rows` is smaller(!) than the values provided by the
> children.
>
> What I expect is that Subset has exactly the same `rows`, `cpu` (and
> `io`) of the selected (purple) child element.
> Having a look at this sub tree [3] the cost propagation works like
> expected.
>
> Besides that I already noticed that Calcite costs seem to have an upper
> bound (9.223372036854775807E18) where costs can't get any higher in sub
> trees where this value is reached in an (physical operator) element.
>
> I know it's hard to tell what Calcite actually does just using
> screenshots. Please let me know if I should provide e.g., my code for
> giving better insights.
>
> Thank you in advance for your reply!
>
> [1]:
>
> https://github.com/apache/calcite/blob/c4042a34ef054b89cec1c47fefcbc8689bad55be/core/src/main/java/org/apache/calcite/plan/volcano/RelSubset.java#L254
> [2]: https://ibb.co/7jtXKH3
> [3]: https://ibb.co/5BZZyLz
>
> Best regards,
> Tony
>


Re: Re: [Question] Unknown cost calculation/propagation in RelSubsets

2024-02-06 Thread Thomas Rebele
Hello,

Thank you for sharing the files. I assume the intermediate costs were
included, as the cost is updated on each step for many nodes.
So in step 89, #197-BindableJoin has a CPU cost of 6.787504586323E12, and
that cost gets updated to 5.6541574E7 in step 90.
However, its parent subset#170 does not get updated, contrary to the
subsets on the path from the new node in step 90, #318. (See [1], [2])
It seems that there's indeed a problem with updating the costs in that
scenario.

Could you debug VolcanoPlanner.propagateCostImprovements(RelNode) for step
90, to see why the propagation does not apply to subset#170?

[1]
https://raw.githack.com/thomasrebele/calcite-relsubset-foo/main/job_query_26b/planner-viz/planner-vizrule-match-viz.html?step=89&dir=0
[2]
https://raw.githack.com/thomasrebele/calcite-relsubset-foo/main/job_query_26b/planner-viz/planner-vizrule-match-viz.html?step=90&dir=0

Cordialement / Best Regards,
*Dr. Thomas Rebele* | R&D Developer | Germany | *E* *treb...@tibco.com
* | *W* *www.cloud.com <http://www.cloud.com/>*

*Cloud SG Germany GmbH* |  ℅ Citrix Systems GmbH, Erika-Mann-Straße 67-69,
80636 München Deutschland | Registergericht: Amtsgericht München, HRB
123355 | Geschäftsführer: Antonio Gomes, Oliver Ebel, Ganesh Vaidyanathan,
Brian Lee Shytle



On Wed, Jan 24, 2024 at 9:51 PM Tony Fiedler <
tony.fied...@mailbox.tu-dresden.de> wrote:

> Hello,
>
> Right, I guess I understand what the color encoding means. Thanks for
> confirming my thoughts.
>
> >> If I remember correctly, the cost for a subset shown at a step should be
> >> the same as the best cost of all children for that particular step.
>
> Right and this is what confuses me since this doesn't always be to seem
> the case -- for simple queries this seem to work perfectly fine.
>
> >> It would be helpful to share at least the generated files (especially
> the
> >> .js), to understand what's going on.
>
> Makes total sense.. There you go [1]. But be warned, the operator tree
> is rather big since I throw the Join Order Benchmark queries [2] from
> Leis et al. *How Good Are Query Optimizers, Really?* against it. Those
> contain many (10+) join operations. In this case the tree originates
> from query 26b. Unfortunately, I wasn't able to reproduce this cost
> calculation behaviour for less complex queries.
>
> [1]:
>
> https://github.com/AES-256-GCM/calcite-relsubset-foo/tree/main/job_query_26b/planner-viz
> [2]: http://www-db.in.tum.de/~leis/qo/job.tgz
>
> Kind regards,
> Tony
>
> On 2024/01/16 16:49:13 Thomas Rebele wrote:
> > Hello,
> >
> > The RuleMatchVisualizer uses the planner to get the cost [1], and the
> > Volcano planner uses the bestCost attribute for RelSubset [2].
> >
> > The color depends on the steps:
> > * For intermediate steps, the purple color shows which nodes have been
> > matched by the rule. Light blue shows added nodes.
> > * For the final step, the purple and light blue colors show the chosen
> > nodes of the final plan. Light blue for the RelSubset nodes.
> >
> > If I remember correctly, the cost for a subset shown at a step should be
> > the same as the best cost of all children for that particular step.
> >
> > It would be helpful to share at least the generated files (especially the
> > .js), to understand what's going on.
> >
> > [1]
> >
> https://github.com/apache/calcite/blob/c4042a34ef054b89cec1c47fefcbc8689bad55be/core/src/main/java/org/apache/calcite/plan/visualizer/RuleMatchVisualizer.java#L300C34-L300C34
> > [2]
> >
> https://github.com/apache/calcite/blob/c4042a34ef054b89cec1c47fefcbc8689bad55be/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L722
> >
> > Cordialement / Best Regards,
> > *Dr. Thomas Rebele* | R&D Developer | Germany | *E* *treb...@tibco.com
> > * | *W* *www.cloud.com <http://www.cloud.com/>*
> >
> > *Cloud SG Germany GmbH* |  ℅ Citrix Systems GmbH, Erika-Mann-Straße
> 67-69,
> > 80636 München Deutschland | Registergericht: Amtsgericht München, HRB
> > 123355 | Geschäftsführer: Antonio Gomes, Alexander E. Kolar, Ganesh
> > Vaidyanathan, Brian Lee Shytle
> >
> >
> >
> > On Fri, Jan 12, 2024 at 11:35 PM Tony Fiedler <
> > tony.fied...@mailbox.tu-dresden.de> wrote:
> >
> > > Dear Calcite devs,
> > >
> > > First of all I really appreciate having a mature framework like
> Calcite.
> > > Please continue your great work on this project!
> > >
> > > My use case is feeding Calcite (v1.35.0) with an SQL query and doing
> > > some optimizations by providing metadata and selected planner rules. I
>

Gather real world statistics about row count, CPU, and IO cost

2020-08-07 Thread Thomas Rebele
Hi all,

I'm working on basic query optimization. I once stumbled on the case that
two operators had the same row count but one had a much higher CPU cost.
Unfortunately the default cost model only takes the row count into account
(see [1]). Stamatis had pointed out in another mail that the row count
might be much more important than the other costs [2]. However, if there
are two possible choices with the same row count, we should prefer the one
with the least CPU cost. I'm wondering whether the assumption that a
smaller row count is better in most cases is actually correct. Also, what
is "better" in this context? The query plan with the least execution time?
Maybe there's a plan that is just <10% slower, but consumes much less
CPU/memory/etc.

So I thought about the cost model in general, and how to improve it. I
assume the better the estimated cost corresponds to the real cost, the
better the optimized plans. So the first step would be to collect the real
world statistics and the second step to adapt the cost estimation so that
there's a better correspondence. For the beginning I would just measure how
many rows have been in the result and how much time has passed for each
RelNode during query execution. Is there already a way to do this in
Calcite? Does this make sense at all?

[1]
https://github.com/apache/calcite/blob/52a57078ba081b24b9d086ed363c715485d1a519/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoCost.java#L100
[2]
https://15721.courses.cs.cmu.edu/spring2019/papers/24-costmodels/p204-leis.pdf

Cordialement / Best Regards,
*Thomas Rebele* | R&D Developer | 18 rue du 4 septembre, 75002 Paris, France
| www.tibco.com


Re: Gather real world statistics about row count, CPU, and IO cost

2020-08-10 Thread Thomas Rebele
Thank you for your analysis and the references. I would be more interested
in sub-problem 2.
A kind of tuning tool would be nice. If a project using Calcite  introduces
custom operators, it could help to find a cost model for them.

I would start with a simple case, e.g., deciding whether a particular join
should be translated into a hash join, merge join, or correlate.
With a different cardinality of the inputs, one algorithm or the other
might be better. The real-world statistics might help to decide which one
to choose.

Cordialement / Best Regards,
*Thomas Rebele* | R&D Developer | 18 rue du 4 septembre, 75002 Paris, France
| www.tibco.com


On Fri, Aug 7, 2020 at 11:32 PM Julian Hyde  wrote:

> Also, check out the paper "Learning to Optimize Join Queries With Deep
> Reinforcement Learning" by Krishnan, Yang, Goldberg, Hellerstein,
> Stoica 2019 (which aims to improve the join-order benchmark and uses
> Calcite as one of its test platforms):
> https://arxiv.org/pdf/1808.03196.pdf
>
> On Fri, Aug 7, 2020 at 2:02 PM Julian Hyde  wrote:
> >
> > I consider there to be two fairly independent sub-problems:
> >
> > 1. Improve our cardinality estimates (especially for queries with
> > complex joins and aggregation).
> >
> > 2. Calibrate physical operators so that, given a good estimate of the
> > number of rows they will see, we can come up with a reasonable
> > estimate of the physical cost (e.g. how long the query will take to
> > execute, and how much memory).
> >
> > For 1, the paper you cite, and the join-order benchmark it introduces,
> > is an excellent contribution to the field. It inspired me to do work
> > on profiling [1]. I would encourage you to build on the work I have
> > already done.
> >
> > For 2, I have not done any work personally. An approach would be to
> > give each physical operator (e.g. EnumerableHashJoin) a cost model
> > parameterized by certain constants, and then run experiments to
> > determine the values of those constants empirically. Perhaps we could
> > write a "TuningTool" that generates an "operator constants file", and
> > thereby start to formalize the process.
> >
> > Julian
> >
> > [1]
> https://www.slideshare.net/julianhyde/data-profiling-in-apache-calcite
> >
> > On Fri, Aug 7, 2020 at 6:57 AM Thomas Rebele 
> wrote:
> > >
> > > Hi all,
> > >
> > > I'm working on basic query optimization. I once stumbled on the case
> that
> > > two operators had the same row count but one had a much higher CPU
> cost.
> > > Unfortunately the default cost model only takes the row count into
> account
> > > (see [1]). Stamatis had pointed out in another mail that the row count
> > > might be much more important than the other costs [2]. However, if
> there
> > > are two possible choices with the same row count, we should prefer the
> one
> > > with the least CPU cost. I'm wondering whether the assumption that a
> > > smaller row count is better in most cases is actually correct. Also,
> what
> > > is "better" in this context? The query plan with the least execution
> time?
> > > Maybe there's a plan that is just <10% slower, but consumes much less
> > > CPU/memory/etc.
> > >
> > > So I thought about the cost model in general, and how to improve it. I
> > > assume the better the estimated cost corresponds to the real cost, the
> > > better the optimized plans. So the first step would be to collect the
> real
> > > world statistics and the second step to adapt the cost estimation so
> that
> > > there's a better correspondence. For the beginning I would just
> measure how
> > > many rows have been in the result and how much time has passed for each
> > > RelNode during query execution. Is there already a way to do this in
> > > Calcite? Does this make sense at all?
> > >
> > > [1]
> > >
> https://github.com/apache/calcite/blob/52a57078ba081b24b9d086ed363c715485d1a519/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoCost.java#L100
> > > [2]
> > >
> https://15721.courses.cs.cmu.edu/spring2019/papers/24-costmodels/p204-leis.pdf
> > >
> > > Cordialement / Best Regards,
> > > *Thomas Rebele* | R&D Developer | 18 rue du 4 septembre, 75002 Paris,
> France
> > > | www.tibco.com
>


Re: [ANNOUNCE] Ruben Quesada Lopez joins Calcite PMC

2020-08-13 Thread Thomas Rebele
Congrats, Ruben!

Cordialement / Best Regards,
*Thomas Rebele* | R&D Developer | 18 rue du 4 septembre, 75002 Paris, France
| www.tibco.com


On Thu, Aug 13, 2020 at 12:54 AM Francis Chuang 
wrote:

> Congratulations, Ruben!
>
> Francis
>
> On 13/08/2020 3:43 am, Andrei Sereda wrote:
> > Congratulations, Ruben!
> >
> > On Wed, Aug 12, 2020 at 1:21 PM Rui Wang  wrote:
> >
> >> Congrats, Ruben! Well deserved!
> >>
> >>
> >>
> >> -Rui
> >>
> >> On Wed, Aug 12, 2020 at 9:24 AM Enrico Olivelli 
> >> wrote:
> >>
> >>> Congrats Ruben!
> >>>
> >>> Enrico
> >>>
> >>> Il Mer 12 Ago 2020, 18:05 Michael Mior  ha scritto:
> >>>
> >>>> Congrats Reuben!
> >>>>
> >>>> --
> >>>> Michael Mior
> >>>> mm...@apache.org
> >>>>
> >>>> Le mer. 12 août 2020 à 09:32, Ruben Q L  a écrit :
> >>>>>
> >>>>> Thanks everyone!
> >>>>> It is an honor to join the Calcite PMC.
> >>>>>
> >>>>> Best regards,
> >>>>> Ruben
> >>>>>
> >>>>>
> >>>>> Le mer. 12 août 2020 à 03:03, Forward Xu  a
> >>>> écrit :
> >>>>>
> >>>>>> Congrats, Ruben!
> >>>>>>
> >>>>>>
> >>>>>> Best
> >>>>>>
> >>>>>> Forward
> >>>>>>
> >>>>>> XING JIN  于2020年8月12日周三 上午8:58写道:
> >>>>>>
> >>>>>>> Congrats, Ruben!
> >>>>>>>
> >>>>>>> 953396112 <953396...@qq.com> 于2020年8月12日周三 上午7:47写道:
> >>>>>>>
> >>>>>>>> Congratulations, Ruben!
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> xzh
> >>>>>>>> -- 原始邮件 --
> >>>>>>>> 发件人:
> >>>>>>>>"dev"
> >>>>>>>>
> >>   <
> >>>>>>>> zabe...@gmail.com>;
> >>>>>>>> 发送时间: 2020年8月12日(星期三) 凌晨5:53
> >>>>>>>> 收件人: "dev" >>>>>>>>
> >>>>>>>> 主题: [ANNOUNCE] Ruben Quesada Lopez joins Calcite PMC
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I'm pleased to announce that Ruben has accepted an invitation
> >> to
> >>>>>>>> join the Calcite PMC. Ruben has been a consistent and helpful
> >>>>>>>> figure in the Calcite community for which we are very grateful.
> >>> We
> >>>>>>>> look forward to the continued contributions and support.
> >>>>>>>>
> >>>>>>>> Please join me in congratulating Ruben!
> >>>>>>>>
> >>>>>>>> - Stamatis (on behalf of the Calcite PMC)
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> >
>


Visualization of Graphviz

2021-08-12 Thread Thomas Rebele
Hello,

the dot graphs generated by the planner get quite confusing even for medium
sized plans. I've added line breaks in the labels with a script and
displayed it with the xdot tool, which makes it a bit better, but there is
still a lot of room for improvement. I've looked around for other tools
(Gephi, Cytoscape), but no tool fulfills the requirements:

- algorithm(s) to avoid overlap of nodes
- line breaks in node labels
- provide a way to visualize RelSubset
- allow to move nodes, hide them (this should update the visualization of
the subsets)
- search nodes by substrings of the label
- find and highlight paths between nodes

Closest was Cytoscape, but I couldn't find a way to visualize the subsets.
Do you have any recommendations? One possibility would be to implement it
with d3js.org, which would allow the integration of the tool in the Calcite
repository. Would there be interest in such a tool?

Cordialement / Best Regards,
*Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com


Re: Visualization of Graphviz

2021-08-13 Thread Thomas Rebele
Thank you for pointing out the visualizer in the Tempura branch. It looks
quite promising.

Cordialement / Best Regards,
*Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com


On Thu, Aug 12, 2021 at 7:19 PM Julian Hyde  wrote:

> I don’t have a strong opinion on this, but I want to point out that there
> are some GraphViz improvements as part of Tempura (the case is
> https://issues.apache.org/jira/browse/CALCITE-4568 <
> https://issues.apache.org/jira/browse/CALCITE-4568>; the specific commit
> is
> https://github.com/hbtoo/calcite/commit/c1240ca7bd054830ebb3107c7509e09a998d4b55
> <
> https://github.com/hbtoo/calcite/commit/c1240ca7bd054830ebb3107c7509e09a998d4b55
> >).
>
>
>
> > On Aug 12, 2021, at 10:09 AM, Thomas Rebele 
> wrote:
> >
> > Hello,
> >
> > the dot graphs generated by the planner get quite confusing even for
> medium
> > sized plans. I've added line breaks in the labels with a script and
> > displayed it with the xdot tool, which makes it a bit better, but there
> is
> > still a lot of room for improvement. I've looked around for other tools
> > (Gephi, Cytoscape), but no tool fulfills the requirements:
> >
> > - algorithm(s) to avoid overlap of nodes
> > - line breaks in node labels
> > - provide a way to visualize RelSubset
> > - allow to move nodes, hide them (this should update the visualization of
> > the subsets)
> > - search nodes by substrings of the label
> > - find and highlight paths between nodes
> >
> > Closest was Cytoscape, but I couldn't find a way to visualize the
> subsets.
> > Do you have any recommendations? One possibility would be to implement it
> > with d3js.org, which would allow the integration of the tool in the
> Calcite
> > repository. Would there be interest in such a tool?
> >
> > Cordialement / Best Regards,
> > *Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com
>
>


Re: Runtime Exception while using SqlParse.

2021-08-17 Thread Thomas Rebele
Hi Jariv,

are you sure you want to join two columns of Purchase: *Purchase.Product_ID
= Purchase.ID*?
If that's the cause of the exception, the error message could be improved.

Cordialement / Best Regards,
*Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com


On Mon, Aug 16, 2021 at 10:58 PM Jariv Narup 
wrote:

> Hi Team,
>I am getting the following exception while trying to parse a SQL
> statement into its tokens. Is this expected behavior? If not, what would
> you suggest to work around this problem. The particulars are given below:
>
> *Issue:*
> *Method threw 'java.lang.RuntimeException' exception. Cannot evaluate
> org.apache.calcite.sql.SqlJoin.toString().*
>
> *Where:*
> This happens when I use the following API (it is visible on the "from"
> value):
>
> SqlParser.Config parserConfig = SqlParser.config();
> parserConfig
> .withCaseSensitive(false)
> .withLex(Lex.ORACLE);
> SqlParser parser = SqlParser.create(sqlString, parserConfig);
> >>> [Line with the issue]*SqlNode sqlNode = parser.parseStmt();*
>
>
> *Debugger Image:*
> [image: image.png]
> *Version In Use:*
> I am using the following maven GAVs
> -org.apache.calcite:calcite-babel:1.27.0, so my assumption is that I am at
> the latest.
>
> *SQL Statement with the issue:*
>
> *SELECT Purchase.Purchase_Date, Products.Product_Name FROM Purchase JOIN
> Products ON Purchase.Product_ID = Purchase.ID WHERE Purchase.Quantity > 10*
>
> I would appreciate any help on this.
>
> Thanks - Jariv
>


Re: Lots of warnings with latest gradle?

2021-10-13 Thread Thomas Rebele
Hello,

I have the same warnings when executing ./gradlew build.

$ ./gradlew --version


Gradle 7.2


Build time:   2021-08-17 09:59:03 UTC
Revision: a773786b58bb28710e3dc96c4d1a7063628952ad

Kotlin:   1.5.21
Groovy:   3.0.8
Ant:  Apache Ant(TM) version 1.10.9 compiled on September 27 2020
JVM:  15 (Oracle Corporation 15+36-1562)
OS:   Linux 5.10.0-9-amd64 amd64

$ java --version
openjdk 15 2020-09-15
OpenJDK Runtime Environment (build 15+36-1562)
OpenJDK 64-Bit Server VM (build 15+36-1562, mixed mode, sharing)

Cordialement / Best Regards,
*Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com


On Wed, Oct 6, 2021 at 4:23 AM Jacques Nadeau  wrote:

> Hey all, Calcite master has the latest gradle in it now. It seems like I'm
> seeing a lot of warnings like:
>
> https://gist.github.com/jacques-n/910b917f52cbf8dda22b2749ef0ea6b6
>
> Are others seeing that as well? Do we have a ticket to address (quick
> search didn't find one)? It makes it much harder to see relevant outputs.
>
> thanks,
> Jacques
>


Re: UNNEST in relational algebra

2021-11-16 Thread Thomas Rebele
Hello,

I don't understand what you want to do here. There's an UNNEST operator in
Calcite, see the comment at the bottom of this section:
https://calcite.apache.org/docs/reference.html#collection-functions.

Cordialement / Best Regards,
*Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com


On Tue, Nov 16, 2021 at 2:30 PM Daniel (Spotify) Ståhl
 wrote:

> Hi
> We are trying to generate relational algebra for SQL that uses UNNEST of
> arrays in BigQuery. Here are three examples of SQL (in Big Query standard
> query syntax) that we want to be able to handle and generate from
> relational algebra. Any tips on how you would do that in Calcite?
>
> SELECT id,
>   (SELECT h.is_active FROM UNNEST(history.all_of_history) h
> WHERE start_date <= "2021-06-01" AND (end_date >= "2021-06-01" OR
> end_date IS NULL))
> FROM `table`;
>
> SELECT id, h.is_active
> FROM `table`
> CROSS JOIN UNNEST(history.all_of_history) h;
>
> SELECT id, count(h) as h
> from `table`
> CROSS JOIN UNNEST(history.all_of_history) h
> GROUP BY id;
>
> The table "table" has two columns: "id" and "history". history is a record
> and "history.all_of_history"
> is a repeated record with three fields ("is_active", "start_date" and
> "end_date").
>
> Thanks,
> Daniel Ståhl
>


Failing task :linq4j:jandexMain when executing CheckerFramework

2021-12-16 Thread Thomas Rebele
Hello,

building Calcite has been a somewhat annoying experience. First some
annoying :core:checkstyleTest that interferes with files generated by
Intellij IDEA, with hundreds of error messages similar to the following:

[ant:checkstyle] [ERROR]
~/calcite-intellij/core/build/generated/sources/version/generated/org/apache/calcite/rel/rules/ImmutableFilterDateRangeRuleConfig.java:26:
Line is longer than 100 characters (found 107). [LineLength]
[ant:checkstyle] [ERROR]
~/calcite-intellij/core/build/javacc/javaCCTest/generated_tests/org/apache/calcite/tools/ImmutableMyProjectFilterRuleConfig.java:341:12:
Redundant 'final' modifier. [RedundantModifier]

These are easy to deal with temporarily by deleting the generated folders.
Any hint for a permanent fix is appreciated. Then I try to reproduce the *CI
/ CheckerFramework (JDK 11) (pull_request)* task locally:

$ JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64 ./gradlew clean
...
BUILD SUCCESSFUL in 5s
$ JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64 ./gradlew build -x test
-PenableCheckerframework
...
> Task :linq4j:jandexMain FAILED
...
Suppressed: com.github.vlsi.jandex.JandexFileParseException:
~/calcite-intellij/linq4j/build/classes/java/main/org/apache/calcite/linq4j/QueryableDefaults$ReplayableQueryable.class
...
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 0
out of bounds for length 0
at
org.jboss.jandex.Indexer.updateTypeTarget(Indexer.java:898)
at
org.jboss.jandex.Indexer.updateTypeTargets(Indexer.java:630)
at org.jboss.jandex.Indexer.index(Indexer.java:1698)
at
com.github.vlsi.jandex.JandexWork.execute(JandexWork.kt:58)
... 33 more

FAILURE: Build failed with an exception.
...

Is there an easy way to fix the problem? The fallback solution is to rely
on the Github CI, which has passed for the same commit.

Cordialement / Best Regards,
*Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com


Re: Failing task :linq4j:jandexMain when executing CheckerFramework

2021-12-17 Thread Thomas Rebele
Hello,

thank you, Vladimir Sitnikov, for the hint with the javaccGeneratedPatterns
in build.gradle.kts. I discovered the cause for the generated files. I
forgot that I had activated the option "build and run [tests] using
Intellij IDEA". Exclusion rules to be added by CALCITE-4947
<https://issues.apache.org/jira/browse/CALCITE-4947>.

About the failing :linq4j:jandexMain, I've additionally extended the PATH
env variable, still the same error. Here my java version:

$ PATH="/usr/lib/jvm/java-11-openjdk-amd64/bin/:$PATH"
JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64 java -version
openjdk version "11.0.13" 2021-10-19
OpenJDK Runtime Environment (build 11.0.13+8-post-Debian-1deb11u1)
OpenJDK 64-Bit Server VM (build 11.0.13+8-post-Debian-1deb11u1, mixed mode,
sharing)

Interestingly, there's no problem when executing the failing task on its
own:

$ PATH="/usr/lib/jvm/java-11-openjdk-amd64/bin/:$PATH"
JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64 ./gradlew
:linq4j:jandexMain
> Configure project :
Building Apache Calcite 1.29.0-SNAPSHOT

> Task :linq4j:compileJava
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

BUILD SUCCESSFUL in 13s
32 actionable tasks: 1 executed, 1 from cache, 30 up-to-date
S3 cache 762ms wasted on misses, reads: 1, elapsed: 762ms

But it fails when executed with other tasks:

$ PATH="/usr/lib/jvm/java-11-openjdk-amd64/bin/:$PATH"
JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64 ./gradlew build -x test
-PenableCheckerframework
> Configure project :
Building Apache Calcite 1.29.0-SNAPSHOT

> Task :linq4j:jandexMain FAILED
...

Cordialement / Best Regards,
*Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com


On Thu, Dec 16, 2021 at 5:48 PM Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> >First some annoying :core:checkstyleTest that interferes with files
> generated by Intellij IDEA, with hundreds of error messages similar to the
> following:
>
> We exclude generated files from the style checks via the following
> patterns:
>
> https://github.com/apache/calcite/blob/a6293a37dd4166e6fe69ae0c5ef7a89e6677b6ab/build.gradle.kts#L259-L276
>
> Please add the relevant pattern and file the PR.
> However, it is really strange that you get the generated files into
> generated/sources/version/generated and javacc/javaCCTest/generated_tests
> folders.
> I don't understand why IDEA would generate files into those folders.
> It does not sound like your ImmutableMyProjectFilterRuleConfig is connected
> with javaCCTest.
> I would try to understand why do you get files into /javaCCTest/ and
> /version/ folders.
>
> A side note: Kotlin does have the official coding style, so there's a
> formatter that yields nice results and that is consistent with IDEs.
> So moving to Kotlin makes many Checkstyle rules obsolete. No joking.
>
> >Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds
> for length 0
>
> It is very likely you are using an outdated version of javac which produced
> invalid bytecode.
> Please upgrade javac (e.g. to the recent patchset).
> Note: GitHub CI uses recent patchet updates, so you might fail to reproduce
> the issue.
>
> We do know javac versions 1.8 before u202 are extremely buggy, so we fail
> the build for those versions.
> It might be time to add the same verification for 11, and require 11.0.5 or
> later (in case of 11).
>
> >Is there an easy way to fix the problem?
>
> The easy fix is to bump the javac version.
> The funny coincidence is that Kotlin solves this issue as well: null safety
> is integrated into the regular compilation,
> so there's no need for a separate (and time-consuming!) checker framework
> compilation.
>
> Vladimir
>


Re: Failed to import gradle

2022-01-27 Thread Thomas Rebele
Hello,

the Calcite mailing list removes pictures. Could you please post the error
message in text form?

Cordialement / Best Regards,
*Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com


On Thu, Jan 27, 2022 at 7:38 AM Aitozi  wrote:

> Hi community:
> I'm new to calcite community and gradle. When I followed the Howto.md
> to import the calcite project, It always failed with the following error.
> After some search in Google and calcite dev mail list, I stiil can't not
> figure it out. Is there anyone can help me out? thanks in advance
>
> [image: image.png]
> The version of the intellij IDEA is 2021.1.3.
>
> Atiozi.
>


Re: ARRAY_CONCAT does not work

2022-02-04 Thread Thomas Rebele
Hello,

SqlParserTest might help you. It has some checks related to BIG_QUERY.
Maybe a
.parserConfig(SqlParser.Config.DEFAULT.withConformance(SqlConformanceEnum.BIG_QUERY))
could fix the problem.

Cordialement / Best Regards,
*Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com


On Fri, Feb 4, 2022 at 12:57 PM Dmitry Sysolyatin 
wrote:

> Hi!
> I have a problem with ARRAY_CONCAT operator.
>
> The following code failed with the exception ''No match found for function
> signature array_concat(, )".
> ```
> val config = Frameworks.newConfigBuilder()
>  .parserConfig(parserConfig)
>  .defaultSchema(CalciteSchema.createRootSchema(false, false).plus())
>   .operatorTable(SqlOperatorTables.chain(
>   SqlStdOperatorTable.instance(),
>   SqlLibraryOperatorTableFactory.INSTANCE.getOperatorTable(
> SqlLibrary.BIG_QUERY
>)))
> .programs(Programs.standard())
> .build()
> val planner = Frameworks.getPlanner(config)
>
> val query = "SELECT ARRAY_CONCAT(ARRAY[1, 2], ARRAY[2, 3])"
> val parsedSqlNode = planner.parse(query)
> planner.validate(parsedSqlNode)
> ```
>
> Am I doing something wrong or it is a bug?
>


Re: [DISCUSS] Github PR link to JIRA issue

2022-03-30 Thread Thomas Rebele
When I visit https://issues.apache.org/jira/browse/CALCITE-6789, I get an
error message:

You can't view this issue
It may have been deleted or you don't have permission to view it.

I can view other tickets, though (e.g.,
https://issues.apache.org/jira/browse/CALCITE-5008).
Maybe the link has been removed because the ticket is not available?

Cordialement / Best Regards,
*Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com


On Wed, Mar 30, 2022 at 7:57 PM Julian Hyde  wrote:

> It seems to happen sporadically. Maybe the commit message of the PR needs
> to start with exactly “[CALCITE-]”.
>
> > On Mar 30, 2022, at 10:45 AM, Haisheng Yuan  wrote:
> >
> > Hi all,
> >
> > Previously, the JIRA issue e.g. [CALCITE-6789] in Calcite github PR and
> > commit message will be automatically linked to the JIRA site. Now there
> is
> > no link anymore.
> >
> > Does anyone know what happened? What can we do to add the link back?
> >
> > Thanks,
> > Haisheng Yuan
>
>


Re: Changes to the rule pattern interface

2022-04-12 Thread Thomas Rebele
Hello,

The reasons for the planner rules configuration can be found here:
CALCITE-3923 <https://issues.apache.org/jira/browse/CALCITE-3923>. See also
the email thread [DISCUSS] Refactor how planner rules are parameterized
<https://lists.apache.org/thread.html/rfdf6f9b7821988bdd92b0377e3d293443a6376f4773c4c658c891cf9%40%3Cdev.calcite.apache.org%3E>
.

Cordialement / Best Regards,
*Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com


On Tue, Apr 12, 2022 at 6:10 PM Gavin Ray  wrote:

> I don't have any weight behind my opinion or experience,
> but anything that lowers the barrier to entry to Calcite for newcomers is a
> huge win in my mind.
>
> I assume the reason for the changes was because codegen improved
> performance?
>
> Could it make sense to allow both options, the easy/less-performant way for
> people who want to experiment and learn the ropes,
> and the codegen path for productionizing the final rules you come up with?
>
> Or does this make matters worse, trying to support two API's
>
> On Tue, Apr 12, 2022 at 6:25 AM Vladimir Ozerov 
> wrote:
>
> > Hi folks,
> >
> > Rules are an essential part of the Calcite-based query optimizers. A
> > typical optimizer may require dozens of custom rules that are created by
> > extending some Apache Calcite interfaces.
> >
> > During the last two years, there were two major revisions of how rules
> are
> > created:
> >
> >1. In early 1.2x versions, the typical approach was to use
> >RelOptRuleOperand with a set of helper methods in a builder-like
> >pattern.
> >2. Then, we switched to the runtime code generation.
> >3. Finally, we switched to the compile-time code generation with the
> >Immutables framework.
> >
> > Every such change requires the downstream projects to rewrite all their
> > rules. Not only does this require time to understand the new approach,
> but
> > it may also compromise the correctness of the downstream optimizer
> because
> > the regression tracking in query optimizers is not trivial.
> >
> > I had the privilege to try all three approaches, and I cannot get rid of
> > the feeling that every new approach is more complicated than the previous
> > one. I understand that this is a highly subjective statement, but when I
> > just started using Apache Calcite and knew very little about it, I was
> able
> > to write rule patterns by simply looking at the IDE JavaDoc pop-ups and
> > code completion. When the RuleConfig was introduced, every new rule
> always
> > required me to look at some other rule as an example, yet it was doable.
> > Now we also need to configure the project build system to write a single
> > custom rule.
> >
> > At the same time, a significant fraction of the rules are pretty simple.
> > E.g., "operator A on top of operator B". If some additional configuration
> > is required, it could be added via plain rules fields, because at the end
> > of the day the rule instance is not more than a plain Java object.
> >
> > A good example is the FilterProjectTransposeRule. What now takes tens of
> > lines of code in the Config subclass [1] (that you hardly could write
> > without a reference example), and ~500 LOC in the generated code that you
> > get through additional plugin configuration [2] in your build system,
> could
> > have been expressed in a dozen lines of code [3] in Apache Calcite
> 1.22.0.
> >
> > My question is - are we sure we are going in the right direction in terms
> > of complexity and the entry bar for the newcomers? Wouldn't it be better
> to
> > follow the 80/20 rule, when simple rules could be easily created
> > programmatically with no external dependencies, while more advanced
> > facilities like Immutables are used only for the complex rules?
> >
> > Regards,
> > Vladimir.
> >
> > [1]
> >
> >
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L208-L260
> > [2]
> >
> >
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/build.gradle.kts#L215-L224
> > [3]
> >
> >
> https://github.com/apache/calcite/blob/calcite-1.22.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L99-L110
> >
>


[jira] [Created] (CALCITE-3977) RelDecorrelator produces incorrect plans if the input plan contains field accesses

2020-05-07 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-3977:
--

 Summary: RelDecorrelator produces incorrect plans if the input 
plan contains field accesses
 Key: CALCITE-3977
 URL: https://issues.apache.org/jira/browse/CALCITE-3977
 Project: Calcite
  Issue Type: Bug
Reporter: Thomas Rebele


The RelCorrelator seems to have problems with some plans that contain a field 
access (probably a RexFieldAccess, but I haven't looked further into it). In 
this ticket there's a filter on *$cor0.birthPlace.city*.

Here the complete plan:
{code:java}
before decorrelate
LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{}])
  LogicalTableScan(table=[[bookstore, authors]])
  LogicalFilter(condition=[=('Munich', $cor0.birthPlace.city)])
LogicalTableScan(table=[[bookstore, authors]])

after decorrelate
LogicalJoin(condition=[=($2, $8)], joinType=[left])
  LogicalTableScan(table=[[bookstore, authors]])
  LogicalJoin(condition=[true], joinType=[inner])
LogicalFilter(condition=[=('Munich', $cor0.birthPlace.city)])
  LogicalTableScan(table=[[bookstore, authors]])
LogicalAggregate(group=[{0}])
  LogicalProject(birthPlace=[$2])
LogicalTableScan(table=[[bookstore, authors]])
{code}
There seem to be two problems:
 * The LogicalCorrelate has been removed, but the $cor0.birthPlace.city is 
still in the filter condition.
 * The inner join does not seem to be necessary. If it is, could somebody 
explain, why?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3978) RelDecorrelator on a plan with field accesses does not update field indices when adding a projection

2020-05-07 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-3978:
--

 Summary: RelDecorrelator on a plan with field accesses does not 
update field indices when adding a projection
 Key: CALCITE-3978
 URL: https://issues.apache.org/jira/browse/CALCITE-3978
 Project: Calcite
  Issue Type: Bug
Reporter: Thomas Rebele


RelDecorrelate introduces a projection (or a new field on an existing 
projection) within the first input of a join node, without shifting the field 
indices accessing the second input:

Schema:
 * table1
 ** F_1 _(varchar)_
 * table2
 ** F_2
 *** F_2_SUB _(varchar)_
 * table3
 ** F_3
 *** F_3_SUB _(varchar)_

{code:java}
before decorrelate
LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{}])
  LogicalTableScan(table=[[TABLE1]])
  LogicalJoin(condition=[=($0.F_2_SUB, $1.F_3_SUB)], joinType=[inner])  <--- (1)
LogicalFilter(condition=[=($cor0.F_1, $0.F_2_SUB)])
  LogicalTableScan(table=[[TABLE2]])
LogicalTableScan(table=[[TABLE3]])

after decorrelate
LogicalJoin(condition=[=($0, $2)], joinType=[inner])
  LogicalTableScan(table=[[TABLE1]])
  LogicalJoin(condition=[true], joinType=[inner])  <--+--- (2)
LogicalFilter(condition=[=($0.F_2_SUB, $1.F_3_SUB)])  <---/
  LogicalFilter(condition=[=($1, $0.F_2_SUB)])
LogicalProject(F_2=[$0], $f1=[$0.F_2_SUB])  <--- (3)
  LogicalTableScan(table=[[TABLE2]])
LogicalTableScan(table=[[TABLE3]])
{code}

A project has been introduced at (3), which adds a field to the first input of 
the join (1)/(2). However, the join condition needs to be fixed. The $1.F_3_SUB 
at (1) should be $2.F_3_SUB at (2). That's probably the reason why the 
condition has been moved to a filter on the first input, as it targets 
(wrongly) only the first input of the join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4157) Use a different sort algorithm for EnumerableDefaults.orderBy(...)

2020-08-05 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-4157:
--

 Summary: Use a different sort algorithm for 
EnumerableDefaults.orderBy(...)
 Key: CALCITE-4157
 URL: https://issues.apache.org/jira/browse/CALCITE-4157
 Project: Calcite
  Issue Type: Improvement
Reporter: Thomas Rebele


As shown by [the 
benchmarks|https://github.com/thomasrebele/jmh-micro-benchmarks/blob/98abccad8801532b78a0778cd7be7bd751f90da6/core-java/doc/jmh_partial_sort_jdk1.8.0_241.txt#L6793]
 for CALCITE-3920, the sort with a TreeMap is slower than sorting with 
Arrays.sort(Object[]). The latter takes about 35% less time than sorting with 
TreeMap over a randomized input. While the implementation is not exactly the 
same, it should be close enough to be able to say something about the 
performance of EnumerableDefaults.orderBy(...). The relevant results for this 
issue are the benchmarks with limit=-1 and algorithms treeMap and 
collectionSort.

The speedup might be even better if the input is already sorted, as modern VMs 
use TimSort, which checks if the input is already sorted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-5470) VolcanoPlanner removes a necessary sort

2023-01-12 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-5470:
--

 Summary: VolcanoPlanner removes a necessary sort
 Key: CALCITE-5470
 URL: https://issues.apache.org/jira/browse/CALCITE-5470
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.32.0
Reporter: Thomas Rebele


A logical plan that roughly corresponds to SELECT a*a FROM 
(VALUES(-10),(2),(3)) T(a) ORDER BY a*a produces a physical plan that is not 
sorted:
{noformat}
logical plan:
LogicalSort(sort0=[$0], dir0=[ASC])
  LogicalProject($f0=[*($0, $0)])
    LogicalValues(tuples=[[{ -10 }, { 2 }, { 3 }]])
{noformat}
The result should be 4, 9, 100.
{noformat}
physical plan:
EnumerableProject($f0=[*($0, $0)])
  EnumerableValues(tuples=[[{ -10 }, { 2 }, { 3 }]]){noformat}
If I understand the physical plan correctly, its result would be 100, 4, 9.

Using sqlline gives the correct result (see attached log), so the problem could 
be in the test itself.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5471) RelSupplier.SqlRelSupplier#apply should use .project(), not .rel

2023-01-12 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-5471:
--

 Summary: RelSupplier.SqlRelSupplier#apply should use .project(), 
not .rel
 Key: CALCITE-5471
 URL: https://issues.apache.org/jira/browse/CALCITE-5471
 Project: Calcite
  Issue Type: Task
Reporter: Thomas Rebele


RelSupplier.SqlRelSupplier#apply should use .project(), not .rel, otherwise the 
result has columns that are not part of the SELECT clause.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5472) RexCallBinding#getOperandMonotonicity should return CONSTANT for RexLiterals

2023-01-12 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-5472:
--

 Summary: RexCallBinding#getOperandMonotonicity should return 
CONSTANT for RexLiterals
 Key: CALCITE-5472
 URL: https://issues.apache.org/jira/browse/CALCITE-5472
 Project: Calcite
  Issue Type: Task
Reporter: Thomas Rebele






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5473) Keep more collations in RelMdCollation#project for monotonic RexCalls

2023-01-12 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-5473:
--

 Summary: Keep more collations in RelMdCollation#project for 
monotonic RexCalls
 Key: CALCITE-5473
 URL: https://issues.apache.org/jira/browse/CALCITE-5473
 Project: Calcite
  Issue Type: Improvement
Reporter: Thomas Rebele


Some RexCalls do not affect the collation, i.e., $0+10 has the same collation 
as $0 (if there are now integer overflows). This is already implemented in 
{{{}SqlMonotonicBinaryOperator#getMonotonicity{}}}.

If we have an input with collation [0,1], and a {{{}Calc(a=$0+10, b=$1){}}}, 
then the output collation would be [0, 1] as well. This is the case for 
STRICTLY_INCREASING or STRICTLY_DECREASING calls (though the direction of the 
field collation might need to be adapted).

However, if the Calc was {{{}Calc(a=FLOOR($0), b=$1){}}}, then the output 
collation would be just [0], as the sortedness of b is not guaranteed. This 
happens for example with rows (0.2, 50) and (0.3, 20) and (0.4, 30), in that 
order. So for INCREASING, DECREASING, or CONSTANT the collation would be 
shortened.

The right place to implement this would probably be 
{{{}RelMdCollation#project{}}}. The {{fieldCollationsForRexCalls}} loop would 
need to stay, because some calls may introduce a collation, regardless of the 
input. E.g., {{SELECT CURRENT_DATE FROM some_random_table}} is always sorted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5686) Generalize return types of the linq4j expression shuttle

2023-05-04 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-5686:
--

 Summary: Generalize return types of the linq4j expression shuttle
 Key: CALCITE-5686
 URL: https://issues.apache.org/jira/browse/CALCITE-5686
 Project: Calcite
  Issue Type: Improvement
Reporter: Thomas Rebele
Assignee: Thomas Rebele
 Fix For: 1.35.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5965) Avoid unnecessary String concatenations in the RexFieldAccess constructor to improve the performance

2023-08-29 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-5965:
--

 Summary: Avoid unnecessary String concatenations in the 
RexFieldAccess constructor to improve the performance
 Key: CALCITE-5965
 URL: https://issues.apache.org/jira/browse/CALCITE-5965
 Project: Calcite
  Issue Type: Task
Reporter: Thomas Rebele
Assignee: Thomas Rebele
 Fix For: 1.36.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-4189) Simplify 'X OR IS NOT TRUE X' to 'TRUE'

2020-08-21 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-4189:
--

 Summary: Simplify 'X OR IS NOT TRUE X' to 'TRUE'
 Key: CALCITE-4189
 URL: https://issues.apache.org/jira/browse/CALCITE-4189
 Project: Calcite
  Issue Type: Improvement
        Reporter: Thomas Rebele


Statements of the form 'P OR IS NOT TRUE P' to 'TRUE' should always be true 
(please correct me if I am wrong):
{code:java}
> select x, X='A' as EQ, 
X='A' IS NOT TRUE as EQ_NOT_TRUE,
(X='A') OR ((X='A') IS NOT TRUE) as EQ_OR_EQ_NOT_TRUE
from (values 'A', 'B', null) as t(x);
+---+---+-+---+
| X |  EQ   | EQ_NOT_TRUE | EQ_OR_EQ_NOT_TRUE |
+---+---+-+---+
| A | true  | false   | true  |
| B | false | true| true  |
|   |   | true| true  |
+---+---+-+---+
{code}
Here a test case for the expected behavior:
{code:java}
@Test void testSimplifyPOrPNotTrue() {
  checkSimplify(
  and(
  vBool(),
  or(
  eq(vInt(), literal(1)),
  isNotTrue(eq(vInt(), literal(1)))
  )),
  "?0.bool0");
}{code}

There are some other, similar expressions, such as 'P IS NOT FALSE OR NOT P', 
which can be reduced to true. Maybe there's a way to handle all of them?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4256) RexSimplify should not simplify P AND P to P, if it contains a call to RAND or RAND_INTEGER

2020-09-14 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-4256:
--

 Summary: RexSimplify should not simplify P AND P to P, if it 
contains a call to RAND or RAND_INTEGER
 Key: CALCITE-4256
 URL: https://issues.apache.org/jira/browse/CALCITE-4256
 Project: Calcite
  Issue Type: Bug
Reporter: Thomas Rebele


Example: RAND_INTEGER() = 1 AND RAND_INTEGER() = 1 is false with a higher 
probability than RAND_INTEGER() = 1

Here a test case for RexProgramTest:
{code}
@Test void testSimplifyRandomAnd() {
checkSimplifyUnchanged(
and(
eq(rexBuilder.makeCall(SqlStdOperatorTable.RAND_INTEGER), 
literal(1)),
eq(rexBuilder.makeCall(SqlStdOperatorTable.RAND_INTEGER), 
literal(1))
));
  }
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4264) The query planner should take CPU cost into account

2020-09-18 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-4264:
--

 Summary: The query planner should take CPU cost into account
 Key: CALCITE-4264
 URL: https://issues.apache.org/jira/browse/CALCITE-4264
 Project: Calcite
  Issue Type: Improvement
Reporter: Thomas Rebele


Calcite only takes the row count into account when optimizing the queries. See 
[the relevant lines in 
VolcanoCost|https://github.com/apache/calcite/blob/52a57078ba081b24b9d086ed363c715485d1a519/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoCost.java#L98-L116].
 However, two plans might have the same row count, but differ greatly in CPU 
cost. This happens for example when the limit sort rule 
([CALCITE-3920|https://issues.apache.org/jira/browse/CALCITE-3920]) is 
activated. The row cost is the same, the EnumerableLimitSort only sorts the 
input partially, so has a lower CPU cost.

Low impact proposal: Compare first the row cost, and only if the row cost is 
equal, compare by CPU cost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4404) Missing LogicalCorrelate when using FILTER_SUB_QUERY_TO_CORRELATE for nested subqueries

2020-11-17 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-4404:
--

 Summary: Missing LogicalCorrelate when using 
FILTER_SUB_QUERY_TO_CORRELATE for nested subqueries
 Key: CALCITE-4404
 URL: https://issues.apache.org/jira/browse/CALCITE-4404
 Project: Calcite
  Issue Type: Bug
Reporter: Thomas Rebele


The Programs#subQuery seem to not support nested subqueries.

When applying the rule to the following plan
{code}
LogicalFilter(condition=[EXISTS({
LogicalJoin(condition=[=($0, $5)], joinType=[left])
  LogicalFilter(condition=[=($cor0.empid, $0)])
LogicalTableScan(table=[[hr, emps]])
  LogicalFilter(condition=[EXISTS({
LogicalFilter(condition=[=($cor1.empid, $0)])
  LogicalTableScan(table=[[hr, emps]])
})])
LogicalTableScan(table=[[hr, emps]])
})])
  LogicalTableScan(table=[[hr, emps]])
{code}

it gets transformed into

{code}
LogicalProject(empid=[$0], deptno=[$1], name=[$2], salary=[$3], commission=[$4])
  LogicalJoin(condition=[true], joinType=[inner])
LogicalTableScan(table=[[hr, emps]])
LogicalAggregate(group=[{0}])
  LogicalProject(i=[true])
LogicalJoin(condition=[=($0, $5)], joinType=[left])
  LogicalFilter(condition=[=($cor0.empid, $0)])
LogicalTableScan(table=[[hr, emps]])
  LogicalProject(empid=[$0], deptno=[$1], name=[$2], salary=[$3], 
commission=[$4])
LogicalCorrelate(correlation=[$cor1], joinType=[inner], 
requiredColumns=[{0}])
  LogicalTableScan(table=[[hr, emps]])
  LogicalAggregate(group=[{0}])
LogicalProject(i=[true])
  LogicalFilter(condition=[=($cor1.empid, $0)])
LogicalTableScan(table=[[hr, emps]])
{code}

The $cor0 has no associated LogicalCorrelate. My guess is that the first 
LogicalJoin should probably be a LogicalCorrelate.

Please note that this is a minimal example. The plan has been obtained by 
simplifying a larger plan and _has no meaning by itself._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4530) EnumerableWindow.copy(...) should adapt the rowType

2021-03-10 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-4530:
--

 Summary: EnumerableWindow.copy(...) should adapt the rowType
 Key: CALCITE-4530
 URL: https://issues.apache.org/jira/browse/CALCITE-4530
 Project: Calcite
  Issue Type: Bug
Affects Versions: 1.26.0
Reporter: Thomas Rebele


The method EnumerableWindow.copy(RelTraitSet, List) copies the row 
type. However, if some input fields have changed their type, then the row type 
of the new EnumerableWindow instance will be wrong.

The following snippet provides a workaround:
{code:java}
private RelNode fixEnumerableWindowType(EnumerableWindow win, List 
newInputs)
{
RelDataType winType = win.getRowType();
RelDataType newType = newInputs.get(0).getRowType();

RelDataTypeFactory.Builder typeBuilder = new 
RelDataTypeFactory.Builder(win.getCluster().getTypeFactory());
typeBuilder.addAll(newType.getFieldList());
for (int i = newType.getFieldCount(); i < 
win.getRowType().getFieldCount(); i++)
{
typeBuilder.add(winType.getFieldList().get(i));
}

return RelBridge.createWindow(win.getCluster(), win.getTraitSet(), 
newInputs.get(0), win.constants,
typeBuilder.build(), win.groups);
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4540) The constructor of EnumerableWindow should be public

2021-03-17 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-4540:
--

 Summary: The constructor of EnumerableWindow should be public
 Key: CALCITE-4540
 URL: https://issues.apache.org/jira/browse/CALCITE-4540
 Project: Calcite
  Issue Type: Task
Reporter: Thomas Rebele


For some use cases it is necessary to change the row type of the window (see 
CALCITE-4530).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4680) AssertionError: contains $cor1, while converting SQL query with UNNEST to logical plan

2021-07-06 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-4680:
--

 Summary: AssertionError: contains $cor1, while converting SQL 
query with UNNEST to logical plan
 Key: CALCITE-4680
 URL: https://issues.apache.org/jira/browse/CALCITE-4680
 Project: Calcite
  Issue Type: Bug
Affects Versions: 1.27.0
Reporter: Thomas Rebele


The exception happens with the following SQL query
{code:java}
SELECT field1 FROM tab t1
WHERE t1.field1 NOT IN (
SELECT l.subfield
FROM tab t2, UNNEST(t2.field2) AS l
WHERE t1.field1 = t2.field1)
{code}
and the schema
{code:java}
RecordType(VARCHAR NOT NULL field1, RecordType(VARCHAR subfield) ARRAY field2)
{code}
(where field2.subfield is nullable). Previous versions of Calcite threw a 
ClassCastException: LogicalCorrelate cannot be cast to Join.

However, it passes after removing the WHERE clause of the subquery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4689) EnumerableUncollectRule causes an IllegalArgumentException: Type mismatch: the field sizes are not equal

2021-07-09 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-4689:
--

 Summary: EnumerableUncollectRule causes an 
IllegalArgumentException: Type mismatch: the field sizes are not equal
 Key: CALCITE-4689
 URL: https://issues.apache.org/jira/browse/CALCITE-4689
 Project: Calcite
  Issue Type: Bug
Affects Versions: 1.27.0
Reporter: Thomas Rebele
 Attachments: Calcite4689.java

The EnumerableUncollectRule converts an Uncollect to an EnumerableUncollect. It 
does not set the itemAliases. This field is used to derive the row type in 
Uncollect#deriveUncollectRowType. Therefore the EnumerableUncollect has a 
different row type than the Uncollect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4691) Incorrect handling of NOT IN and NULL values

2021-07-12 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-4691:
--

 Summary: Incorrect handling of NOT IN and NULL values
 Key: CALCITE-4691
 URL: https://issues.apache.org/jira/browse/CALCITE-4691
 Project: Calcite
  Issue Type: Bug
Reporter: Thomas Rebele


It seems that Calcite does not handle WHERE foo NOT IN (values) correctly, if 
values contains NULLS. All DBMS on [SQL 
Fiddle|http://sqlfiddle.com/#!9/5e4c4d/1] produce an empty result in that case.
{code:java}
-- schema
create table tab(v varchar(20));
insert into tab(v) values('R');
-- queries
select 'R' from tab where 'R' in ('S'); -- empty
select 'R' from tab where 'R' in ('S', NULL); -- empty
select 'R' from tab where 'R' not in ('S'); -- 'R'
select 'R' from tab where 'R' not in ('S', NULL); -- empty{code}
However, a similar query with Calcite sqlline produces a different output:
{code:java}
select 'R' from (values (1)) AS tab(v) where 'R' in ('S'); -- empty
select 'R' from (values (1)) AS tab(v) where 'R' in ('S', NULL); -- empty
select 'R' from (values (1)) AS tab(v) where 'R' not in ('S'); -- 'R'
select 'R' from (values (1)) AS tab(v) where 'R' not in ('S', NULL); -- 
'R'{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4848) Adding a HAVING condition to a query with a dynamic parameter makes the result empty

2021-10-12 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-4848:
--

 Summary: Adding a HAVING condition to a query with a dynamic 
parameter makes the result empty
 Key: CALCITE-4848
 URL: https://issues.apache.org/jira/browse/CALCITE-4848
 Project: Calcite
  Issue Type: Bug
Affects Versions: 1.27.0
Reporter: Thomas Rebele
 Attachments: CALCITE-4848.patch

The query
{code:java}
SELECT sal, COUNT(1) AS count_val
FROM emp t WHERE sal = ?
GROUP BY sal HAVING sal < 1000 {code}
gets translated to the following logical plan:
{code:java}
LogicalFilter(condition=[<($0, 1000)])
  LogicalAggregate(group=[{0}], COUNT_VAL=[COUNT()])
LogicalProject(SAL=[$5], $f1=[1])
  LogicalFilter(condition=[=($5, ?0)])
LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
However, applying CoreRules.FILTER_REDUCE_EXPRESSIONS to the plan it just 
becomes
{code:java}
LogicalValues(tuples=[[]]) {code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4944) Add a property to activate the RuleMatchVisualizer

2021-12-16 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-4944:
--

 Summary: Add a property to activate the RuleMatchVisualizer
 Key: CALCITE-4944
 URL: https://issues.apache.org/jira/browse/CALCITE-4944
 Project: Calcite
  Issue Type: New Feature
Reporter: Thomas Rebele


As discussed in 
[CALCITE-4737|https://issues.apache.org/jira/browse/CALCITE-4737?focusedCommentId=17459849#comment-17459849],
 a property to activate the RuleMatchVisualizer should be implemented. To be 
investigated whether the common parts of the generated html files should me 
moved to different files (css, js), and reused for all generated files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CALCITE-4947) Checkstyle fails on classes generated by Intellij when using option "build and run [tests] using Intellij IDEA"

2021-12-17 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-4947:
--

 Summary: Checkstyle fails on classes generated by Intellij when 
using option "build and run [tests] using Intellij IDEA"
 Key: CALCITE-4947
 URL: https://issues.apache.org/jira/browse/CALCITE-4947
 Project: Calcite
  Issue Type: Bug
Reporter: Thomas Rebele


Running a test repeatedly in Intellij can be sped up when setting the options 
"Build and run using Intellij IDEA" and "Run tests using Intellij IDEA". 
However, that makes :core:checkstyleTest fail with errors, such as the 
following:

{code}[ant:checkstyle] [ERROR] 
~/calcite-intellij/core/build/generated/sources/version/generated/org/apache/calcite/rel/rules/ImmutableFilterDateRangeRuleConfig.java:26:
 Line is longer than 100 characters (found 107). [LineLength]

[ant:checkstyle] [ERROR] 
~/calcite-intellij/core/build/javacc/javaCCTest/generated_tests/org/apache/calcite/tools/ImmutableMyProjectFilterRuleConfig.java:341:12:
 Redundant 'final' modifier. [RedundantModifier]
{code}

It would be nice to exclude them from being checked/verified in any way.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CALCITE-5061) Improve recursive application of the field trimming

2022-03-23 Thread Thomas Rebele (Jira)
Thomas Rebele created CALCITE-5061:
--

 Summary: Improve recursive application of the field trimming
 Key: CALCITE-5061
 URL: https://issues.apache.org/jira/browse/CALCITE-5061
 Project: Calcite
  Issue Type: Improvement
Reporter: Thomas Rebele
Assignee: Thomas Rebele
 Fix For: 1.31.0


The RelFieldTrimmer has some shortcomings:
 * If the plan contains certain set ops (e.g., UNION(all=false)), even if the 
operator needs all fields for the correct result (CALCITE-3399), it may still 
make sense to apply the trimming to the children. See CALCITE-5051 for an 
example.
 * Same applies for a Sort with dynamic parameters in the fetch/offset, and 
RepeatUnions
 * The makeZeroLiteral logic in trimChildRestore(...) does not work for ARRAY / 
Java types.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)