Re: [discuss] Add the default mode to the path in the Json functions.

2020-04-19 Thread Chunwei Lei
Thank you for proposing this, Forward.

 I am wondering whether it is really useful for users who want to use these
functions
since they probably know what the syntax is. But personally I would like to
have the
default mode since it is more convenient.


Best,
Chunwei


On Sun, Apr 19, 2020 at 9:51 AM Forward Xu  wrote:

> hi everyone, I recently found some discussable optimizations when I
> contributed the flip-90 [1] Json functions:
> The current json functions in calcite: JSON_EXISTS, JSON_VALUE, JSON_QUERY,
> JSON_OBJECT, JSON_OBJECTAGG, JSON_ARRAY, JSON_ARRAYAGG and IS JSON
> predication functions. These functions are implemented based on the SQL
> 2016-2017 standard [2]. According to this standard, the path of the json
> function must be used in one of strict or lax mode. such as:
> json_exists ('{"foo": "bar"}', 'lax $ .foo') or
> json_exists ('{"foo": "bar"}', 'strict $ .foo')
> Can we give a default mode to simplify the use of lax and strict. For
> example, we default to lax mode. In this way, the use of our json function
> can be simplified to:
> json_exists ('{"foo": "bar"}', '$ .foo')
> Implementation idea improvement JsonFunctions jsonApiCommonSyntax path
> judgment to increase the default lax mode logic.
> Of course, these changes are not described in SQL2016-2017.
> I want to hear your opinion here.
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=141724550
> [2]
>
> https://standards.iso.org/ittf/PubliclyAvailableStandards/c067367_ISO_IEC_TR_19075-6_2017.zip
>


[jira] [Created] (CALCITE-3940) Hint item can not parse correctly if the name is right after token "/*+"

2020-04-19 Thread Danny Chen (Jira)
Danny Chen created CALCITE-3940:
---

 Summary: Hint item can not parse correctly if the name is right 
after token "/*+"
 Key: CALCITE-3940
 URL: https://issues.apache.org/jira/browse/CALCITE-3940
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.22.0
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 1.23.0


After parsing /*+OPTIONS('k1'='v1', 'k2'='v2')*/, the parsed hint name is 
"PTIONS", while it is expected to be "OPTIONS".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Towards Cascades Optimizer

2020-04-19 Thread Xiening Dai
Hi Igor,

Your comment - "because actual cost may be calculated correctly using physical 
operators only. So won't be able to implement Branch and Bound Space Pruning.“ 
is actually not true. In Cascade’s lower bound / upper bound pruning algorithm, 
you can get cost lower bound of input RelNode using cardinality * 
unit_copy_cost. The unit_copy_cost is a constant, which stands for the minimal 
cost for processing a tuple from the input. So this will be the minimal cost 
the input RelNode can achieve, and if this is indeed larger than the current 
best cost, this input node can be pruned. 

In this proposal, we are not trying to split logical and physical planning 
entirely. But for any given equivalent set, we would need to finish exploration 
before implementation. But for the entire memo tree, each set could be in 
different planning stage, and an equivalent set can be pruned even before it’s 
implemented. A text book example of aforementioned “lower bound / upper bound 
pruning” would be the join ordering case.

Regarding #3, I think we can still achieve that (partially) through this 
proposal. Remember every time when we start the optimization, we pass down an 
upper bound limit. Initially this upper bound for root is infinite - meaning 
that no feasible plan is available. Every time when we find a physical plan we 
update this upper bound, then start the search again. We could stop the search 
when the cost is less than a pre-defined threshold - which gives you a “good 
enough” plan with early termination. Still this wouldn't avoid the logical 
exploration. For that, you would probably archive through rule configurations, 
and avoid some expansive transformation to keep the cost down. 


> On Apr 19, 2020, at 7:30 PM, Haisheng Yuan  wrote:
> 
> Igor,
> 
> a) Given current Calcite's stats derivation strategy, mixing logical and 
> physical planning won't make it better. I hope you went through my email to 
> the end, currently operators inside a memo group don't share stats info, each 
> operator's stats may differ with the other one, and the RelSubset's best node 
> and stats may vary during optimization. So logical and physical rule pruning 
> are not safe at the moment, otherwise it almost implies changing downstream 
> project's cost model silently. 
> 
> On the other hand, ensuring child group exploration task finish first will 
> make rule mutual exclusivity check possible, like the result of 
> ReduceExpressionRule won't need trigger the same rule again, The join result 
> of JoinCommuteRule won't need trigger JoinCommuteRule and 
> ReduceExpressionRule again. 
> 
> More importantly, most if not all the the long planning queries in Calcite 
> are not caused by too many alternatives, but mutual triggering, or cyclic 
> triggering, which can be avoided by customizing the rules instead of using 
> the default one. Unless you use dynamic programming (Calcite use heuristic) 
> on join reordering (left-deep, right-deep, bushy), space pruning won't help 
> make long / infinite running query faster.
> 
> b) No evidence shows current version of Calcite will return the most 
> promising plan in first planning iteration. Instead of praying for getting 
> good enough plan in the first iteration, why not focus on fixing rules that 
> causes the issue?
> 
> c) That is not the goal.
> 
> On 2020/04/19 15:14:57, Seliverstov Igor  wrote: 
>> Haisheng,
>> 
>> From my point of view splitting logical and physical planning parts isn’t a 
>> good idea. 
>> 
>> I think so because actual cost may be calculated correctly using physical 
>> operators only. So that we:
>> a) won't be able to implement Branch and Bound Space Pruning (as far as I 
>> understand, at exploring time there are no physical operators, no correct 
>> costs, but assumptions only, I don’t think we should rely on them)
>> b) won’t be able to get good enough plan (without Branch and Bound Space 
>> Pruning it’s non-trivial task to get right search direction and most 
>> promising plans in first planning iterations)
>> c) won’t be able to tune the good enough plan during several similar queries 
>> are executed
>> 
>> Regards,
>> Igor
>> 
>> 
>>> 19 апр. 2020 г., в 17:37, Haisheng Yuan  написал(а):
>>> 
>>> Hi Igor,
>>> 
>>> You can't have your cake and eat it.
>>> But one feasible work item definitely we can do is that once timeout, stop 
>>> exploring, use the first available physical operator in each group and 
>>> optimize it.
>>> 
>>> Because most, if not all, of the long / infinite running optimizations are 
>>> caused by project related transpose, join reordering (join commute + 
>>> associative), constant reduction for large expression (see CALCITE-3460), 
>>> all of which are logical transformations rules and many of which have 
>>> corresponding JIRAs. So another alternative is, let's fix these bugs to 
>>> improve Calcite to make timeout option less usable.
>>> 
>>> Another thing worth noting is that sql server optimizer timeout mechanism 

Re: The Travis CI build failure

2020-04-19 Thread Fan Liya
According to my observations, many failures are caused by SocketTimeOut
exceptions.
So maybe we need to add timeout & retry logic for such test cases.

Best,
Liya Fan

On Mon, Apr 20, 2020 at 11:13 AM Francis Chuang 
wrote:

> Does the Github Actions tests always pass when this happens?
>
> I wonder if this is due to a network issue or a corrupt gradle cache on
> the travis nodes.
>
> See: https://github.com/facebook/react-native/issues/21130
>
> On 20/04/2020 12:58 pm, XING JIN wrote:
> > I have force pushed several times, but it keeps failing.
> >
> > Chunwei Lei  于2020年4月20日周一 上午10:49写道:
> >
> >> I am a little confused too. Sometimes the traivs ci build of master
> fails
> >> when I merge
> >> the pr which passes all tests.
> >>
> >>
> >>
> >> Best,
> >> Chunwei
> >>
> >>
> >> On Mon, Apr 20, 2020 at 10:33 AM XING JIN 
> wrote:
> >>
> >>> In PR https://github.com/apache/calcite/pull/1926 and
> >>> https://github.com/apache/calcite/pull/1925, The Travis CI build keeps
> >>> failing with below messages. The exception has nothing to do with the
> >>> change in PR. There seems to be something wrong with the building
> >>> environment.
> >>>
> >>> ...
> >>>
> >>>
> >>
> /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:42:
> >>> error: cannot access java.lang
> >>> class HttpServer {
> >>> ^
> >>>zip END header not found
> >>>
> >>>
> >>
> /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:43:
> >>> error: cannot find symbol
> >>>private static String localIpAddress;
> >>>   ^
> >>>symbol:   class String
> >>>location: class HttpServer
> >>>
> >>>
> >>
> /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:45:
> >>> error: cannot find symbol
> >>>private final File resourceBase;
> >>>  ^
> >>>symbol:   class File
> >>>location: class HttpServer
> >>> ...
> >>>
> >>> Jin
> >>>
> >>
> >
>


Re: The Travis CI build failure

2020-04-19 Thread Francis Chuang

Does the Github Actions tests always pass when this happens?

I wonder if this is due to a network issue or a corrupt gradle cache on 
the travis nodes.


See: https://github.com/facebook/react-native/issues/21130

On 20/04/2020 12:58 pm, XING JIN wrote:

I have force pushed several times, but it keeps failing.

Chunwei Lei  于2020年4月20日周一 上午10:49写道:


I am a little confused too. Sometimes the traivs ci build of master fails
when I merge
the pr which passes all tests.



Best,
Chunwei


On Mon, Apr 20, 2020 at 10:33 AM XING JIN  wrote:


In PR https://github.com/apache/calcite/pull/1926 and
https://github.com/apache/calcite/pull/1925, The Travis CI build keeps
failing with below messages. The exception has nothing to do with the
change in PR. There seems to be something wrong with the building
environment.

...



/home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:42:

error: cannot access java.lang
class HttpServer {
^
   zip END header not found



/home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:43:

error: cannot find symbol
   private static String localIpAddress;
  ^
   symbol:   class String
   location: class HttpServer



/home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:45:

error: cannot find symbol
   private final File resourceBase;
 ^
   symbol:   class File
   location: class HttpServer
...

Jin







Re: The Travis CI build failure

2020-04-19 Thread XING JIN
I have force pushed several times, but it keeps failing.

Chunwei Lei  于2020年4月20日周一 上午10:49写道:

> I am a little confused too. Sometimes the traivs ci build of master fails
> when I merge
> the pr which passes all tests.
>
>
>
> Best,
> Chunwei
>
>
> On Mon, Apr 20, 2020 at 10:33 AM XING JIN  wrote:
>
> > In PR https://github.com/apache/calcite/pull/1926 and
> > https://github.com/apache/calcite/pull/1925, The Travis CI build keeps
> > failing with below messages. The exception has nothing to do with the
> > change in PR. There seems to be something wrong with the building
> > environment.
> >
> > ...
> >
> >
> /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:42:
> > error: cannot access java.lang
> > class HttpServer {
> > ^
> >   zip END header not found
> >
> >
> /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:43:
> > error: cannot find symbol
> >   private static String localIpAddress;
> >  ^
> >   symbol:   class String
> >   location: class HttpServer
> >
> >
> /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:45:
> > error: cannot find symbol
> >   private final File resourceBase;
> > ^
> >   symbol:   class File
> >   location: class HttpServer
> > ...
> >
> > Jin
> >
>


Re: The Travis CI build failure

2020-04-19 Thread Chunwei Lei
I am a little confused too. Sometimes the traivs ci build of master fails
when I merge
the pr which passes all tests.



Best,
Chunwei


On Mon, Apr 20, 2020 at 10:33 AM XING JIN  wrote:

> In PR https://github.com/apache/calcite/pull/1926 and
> https://github.com/apache/calcite/pull/1925, The Travis CI build keeps
> failing with below messages. The exception has nothing to do with the
> change in PR. There seems to be something wrong with the building
> environment.
>
> ...
>
> /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:42:
> error: cannot access java.lang
> class HttpServer {
> ^
>   zip END header not found
>
> /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:43:
> error: cannot find symbol
>   private static String localIpAddress;
>  ^
>   symbol:   class String
>   location: class HttpServer
>
> /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:45:
> error: cannot find symbol
>   private final File resourceBase;
> ^
>   symbol:   class File
>   location: class HttpServer
> ...
>
> Jin
>


The Travis CI build failure

2020-04-19 Thread XING JIN
In PR https://github.com/apache/calcite/pull/1926 and
https://github.com/apache/calcite/pull/1925, The Travis CI build keeps
failing with below messages. The exception has nothing to do with the
change in PR. There seems to be something wrong with the building
environment.

...
/home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:42:
error: cannot access java.lang
class HttpServer {
^
  zip END header not found
/home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:43:
error: cannot find symbol
  private static String localIpAddress;
 ^
  symbol:   class String
  location: class HttpServer
/home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:45:
error: cannot find symbol
  private final File resourceBase;
^
  symbol:   class File
  location: class HttpServer
...

Jin


Re: [DISCUSS] Towards Cascades Optimizer

2020-04-19 Thread Haisheng Yuan
Igor,

a) Given current Calcite's stats derivation strategy, mixing logical and 
physical planning won't make it better. I hope you went through my email to the 
end, currently operators inside a memo group don't share stats info, each 
operator's stats may differ with the other one, and the RelSubset's best node 
and stats may vary during optimization. So logical and physical rule pruning 
are not safe at the moment, otherwise it almost implies changing downstream 
project's cost model silently. 

On the other hand, ensuring child group exploration task finish first will make 
rule mutual exclusivity check possible, like the result of ReduceExpressionRule 
won't need trigger the same rule again, The join result of JoinCommuteRule 
won't need trigger JoinCommuteRule and ReduceExpressionRule again. 

More importantly, most if not all the the long planning queries in Calcite are 
not caused by too many alternatives, but mutual triggering, or cyclic 
triggering, which can be avoided by customizing the rules instead of using the 
default one. Unless you use dynamic programming (Calcite use heuristic) on join 
reordering (left-deep, right-deep, bushy), space pruning won't help make long / 
infinite running query faster.

b) No evidence shows current version of Calcite will return the most promising 
plan in first planning iteration. Instead of praying for getting good enough 
plan in the first iteration, why not focus on fixing rules that causes the 
issue?

c) That is not the goal.

On 2020/04/19 15:14:57, Seliverstov Igor  wrote: 
> Haisheng,
> 
> From my point of view splitting logical and physical planning parts isn’t a 
> good idea. 
> 
> I think so because actual cost may be calculated correctly using physical 
> operators only. So that we:
>  a) won't be able to implement Branch and Bound Space Pruning (as far as I 
> understand, at exploring time there are no physical operators, no correct 
> costs, but assumptions only, I don’t think we should rely on them)
>  b) won’t be able to get good enough plan (without Branch and Bound Space 
> Pruning it’s non-trivial task to get right search direction and most 
> promising plans in first planning iterations)
>  c) won’t be able to tune the good enough plan during several similar queries 
> are executed
> 
> Regards,
> Igor
> 
> 
> > 19 апр. 2020 г., в 17:37, Haisheng Yuan  написал(а):
> > 
> > Hi Igor,
> > 
> > You can't have your cake and eat it.
> > But one feasible work item definitely we can do is that once timeout, stop 
> > exploring, use the first available physical operator in each group and 
> > optimize it.
> > 
> > Because most, if not all, of the long / infinite running optimizations are 
> > caused by project related transpose, join reordering (join commute + 
> > associative), constant reduction for large expression (see CALCITE-3460), 
> > all of which are logical transformations rules and many of which have 
> > corresponding JIRAs. So another alternative is, let's fix these bugs to 
> > improve Calcite to make timeout option less usable.
> > 
> > Another thing worth noting is that sql server optimizer timeout mechanism 
> > is not based on clock time, but the number of optimization tasks it has 
> > done [1].
> > 
> > [1] 
> > https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188
> > 
> > Regards,
> > Haisheng 
> > 
> > On 2020/04/19 11:31:27, Seliverstov Igor  wrote: 
> >> Haisheng,
> >> 
> >> Ok, then such notification isn't needed
> >> 
> >> But in this case we don't have any control over how long planning takes
> >> 
> >> For some systems it's necessary to get good enough plan right now instead
> >> of best one after while
> >> 
> >> For example we've been considering a case when a query is optimised several
> >> times in short iterations in case it's impossible to get the best plan in
> >> reasonable period of time (for example there is an SLA for response time)
> >> 
> >> This mean we need all needed physical implementations after each logical
> >> transformation is applied.
> >> 
> >> Regards,
> >> Igor
> >> 
> >> 
> >> вс, 19 апр. 2020 г., 13:55 Haisheng Yuan :
> >> 
> >>> Hi Igor,
> >>> 
> >>> There will be no rule queue anymore. Y will be fully explored (logical
> >>> rules are matched and applied) before it can be implemented and optimized.
> >>> 
> >>> Thanks,
> >>> Haisheng
> >>> 
> >>> On 2020/04/19 10:11:45, Seliverstov Igor  wrote:
>  Hi Haisheng,
>  
>  Great explanation, thanks.
>  
>  One thing I'd like to cover in advance is trait propagation process (I
> >>> need
>  it for Apache Ignite SQL engine implementation).
>  
>  For example:
>  
>  There are two nodes: Rel X and its child node Rel Y
>  
>  Both nodes are in Optimized state, and there is a Logical rule for Rel Y
> >>> in
>  a rules queue matched,
>  
>  After logical rule is applied, there is a logical rel Rel Y' and
> >>> 

[jira] [Created] (CALCITE-3939) more auto pruning rules after SubstitutionRule is introduced

2020-04-19 Thread Botong Huang (Jira)
Botong Huang created CALCITE-3939:
-

 Summary: more auto pruning rules after SubstitutionRule is 
introduced
 Key: CALCITE-3939
 URL: https://issues.apache.org/jira/browse/CALCITE-3939
 Project: Calcite
  Issue Type: Improvement
Reporter: Botong Huang


UnionEliminatorRule and ProjectRemoveRule are both pruning rules for a RelNode. 
They can also become SubstitutionRule with autoprune enabled



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Towards Calcite-Avatica 1.17.0

2020-04-19 Thread Francis Chuang

Hey everyone,

I am planning to make rc0 available for voting towards the end of April 
/ start of May. How does this look in terms of timing?


Francis

On 12/04/2020 2:51 am, Stamatis Zampetakis wrote:

Thanks again for taking the lead on this Francis!

Personally, I am quite busy these days but will do my best to check 1-2 PRs.

On Fri, Apr 10, 2020 at 2:57 PM Josh Elser  wrote:


Always a good idea.

I'll add this to my list and see if I can help get any committed. It's
been a while since I've looked at the list.

On 4/8/20 7:59 PM, Francis Chuang wrote:

The last avatica release was in December last year.
  From activity on our mailing lists and the Calcite repository, it feels
like there's currently a lull and things aren't as active.

Would this be a good opportunity to work on Avatica and push a release

out?


There are currently 10 open PRs for Avatica [1] and I think if they are
reviewed and merged, it wold be possible to close a huge chunk of them.

I am happy to be RM for this release, but will need help from the
community to review and merge those PRs.

Francis

[1] https://github.com/apache/calcite-avatica/pulls






Stored Proc to Relational Expression

2020-04-19 Thread Ravi Kapoor
Hi Team,

I have my use where I need to convert my dialect specific stored procedure
constructs like while loop, If then else to Rel expression

Basically this can contain control flow statements like below

DECLARE heads BOOL;
DECLARE heads_count INT64 DEFAULT 0;
LOOP
  SET heads = RAND() < 0.5;
  IF heads THEN
SELECT 'Heads!';
SET heads_count = heads_count + 1;
  ELSE
SELECT 'Tails!';
BREAK;
  END IF;
END LOOP;
SELECT CONCAT(CAST(heads_count AS STRING), ' heads in a row');


I can create a Java AST model from the linq4j provided by calcite however
this is only going to generate Java Result and I believe its only used by
the calcite for relational expressions of enumerable calling convention
which is used by adapters which does not support core relational operations
right?

Is there a way I can convert the stored proc constructs into some canonical
form like Rel Tree and back to Stored proc of target dialect.
-- 

Thanks,
Ravi


Re: [DISCUSS] Towards Cascades Optimizer

2020-04-19 Thread Seliverstov Igor
Haisheng,

From my point of view splitting logical and physical planning parts isn’t a 
good idea. 

I think so because actual cost may be calculated correctly using physical 
operators only. So that we:
 a) won't be able to implement Branch and Bound Space Pruning (as far as I 
understand, at exploring time there are no physical operators, no correct 
costs, but assumptions only, I don’t think we should rely on them)
 b) won’t be able to get good enough plan (without Branch and Bound Space 
Pruning it’s non-trivial task to get right search direction and most promising 
plans in first planning iterations)
 c) won’t be able to tune the good enough plan during several similar queries 
are executed

Regards,
Igor


> 19 апр. 2020 г., в 17:37, Haisheng Yuan  написал(а):
> 
> Hi Igor,
> 
> You can't have your cake and eat it.
> But one feasible work item definitely we can do is that once timeout, stop 
> exploring, use the first available physical operator in each group and 
> optimize it.
> 
> Because most, if not all, of the long / infinite running optimizations are 
> caused by project related transpose, join reordering (join commute + 
> associative), constant reduction for large expression (see CALCITE-3460), all 
> of which are logical transformations rules and many of which have 
> corresponding JIRAs. So another alternative is, let's fix these bugs to 
> improve Calcite to make timeout option less usable.
> 
> Another thing worth noting is that sql server optimizer timeout mechanism is 
> not based on clock time, but the number of optimization tasks it has done [1].
> 
> [1] 
> https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188
> 
> Regards,
> Haisheng 
> 
> On 2020/04/19 11:31:27, Seliverstov Igor  wrote: 
>> Haisheng,
>> 
>> Ok, then such notification isn't needed
>> 
>> But in this case we don't have any control over how long planning takes
>> 
>> For some systems it's necessary to get good enough plan right now instead
>> of best one after while
>> 
>> For example we've been considering a case when a query is optimised several
>> times in short iterations in case it's impossible to get the best plan in
>> reasonable period of time (for example there is an SLA for response time)
>> 
>> This mean we need all needed physical implementations after each logical
>> transformation is applied.
>> 
>> Regards,
>> Igor
>> 
>> 
>> вс, 19 апр. 2020 г., 13:55 Haisheng Yuan :
>> 
>>> Hi Igor,
>>> 
>>> There will be no rule queue anymore. Y will be fully explored (logical
>>> rules are matched and applied) before it can be implemented and optimized.
>>> 
>>> Thanks,
>>> Haisheng
>>> 
>>> On 2020/04/19 10:11:45, Seliverstov Igor  wrote:
 Hi Haisheng,
 
 Great explanation, thanks.
 
 One thing I'd like to cover in advance is trait propagation process (I
>>> need
 it for Apache Ignite SQL engine implementation).
 
 For example:
 
 There are two nodes: Rel X and its child node Rel Y
 
 Both nodes are in Optimized state, and there is a Logical rule for Rel Y
>>> in
 a rules queue matched,
 
 After logical rule is applied, there is a logical rel Rel Y' and
>>> containing
 it set moves into Exploring state (since there is a logical node which
>>> has
 to be implemented)
 
 After whole process Exploring -> ... -> Optimized we need to force parent
 Rel X set to switch its state from Optimized to Optimizing to peek the
 newly implemented child and (possible) produce a new Rel X' physical rel
>>> on
 the basis of previously requested from the Rel X set traits.
 
 If a new Rel X' is produced, a Rel X' parent should move its state
 Optimized -> Optimizing and repeat described above operations.
 
 Does it look like true?
 
 Regards,
 Igor
 
 
 вс, 19 апр. 2020 г., 6:52 Haisheng Yuan :
 
> Hi,
> 
> In the past few months, we have discussed a lot about Cascades style
> top-down optimization, including on-demand trait derivation/request,
>>> rule
> apply, branch and bound space pruning. Now we think it is time to move
> towards these targets.
> 
> We will separate it into several small issues, and each one can be
> integrated as a standalone, independent feature, and most importantly,
> meanwhile keep backward compatibility.
> 
> 1. Top-down trait request
> In other words, pass traits requirements from parent nodes to child
>>> nodes.
> The trait requests happens after all the logical transformation rules
>>> and
> physical implementation rules are done, in a top-down manner, driven
>>> from
> root set. e.g.:
> SELECT a, sum(c) FROM
>(SELECT * FROM R JOIN S USING (a, b)) t
> GROUP BY a;
> 
> Suppose we have the following plan tree in the MEMO, and let's only
> consider distribution for simplicity, each group represents a 

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-19 Thread Haisheng Yuan
Hi Igor,

You can't have your cake and eat it.
But one feasible work item definitely we can do is that once timeout, stop 
exploring, use the first available physical operator in each group and optimize 
it.

Because most, if not all, of the long / infinite running optimizations are 
caused by project related transpose, join reordering (join commute + 
associative), constant reduction for large expression (see CALCITE-3460), all 
of which are logical transformations rules and many of which have corresponding 
JIRAs. So another alternative is, let's fix these bugs to improve Calcite to 
make timeout option less usable.

Another thing worth noting is that sql server optimizer timeout mechanism is 
not based on clock time, but the number of optimization tasks it has done [1].

[1] 
https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188

Regards,
Haisheng 

On 2020/04/19 11:31:27, Seliverstov Igor  wrote: 
> Haisheng,
> 
> Ok, then such notification isn't needed
> 
> But in this case we don't have any control over how long planning takes
> 
> For some systems it's necessary to get good enough plan right now instead
> of best one after while
> 
> For example we've been considering a case when a query is optimised several
> times in short iterations in case it's impossible to get the best plan in
> reasonable period of time (for example there is an SLA for response time)
> 
> This mean we need all needed physical implementations after each logical
> transformation is applied.
> 
> Regards,
> Igor
> 
> 
> вс, 19 апр. 2020 г., 13:55 Haisheng Yuan :
> 
> > Hi Igor,
> >
> > There will be no rule queue anymore. Y will be fully explored (logical
> > rules are matched and applied) before it can be implemented and optimized.
> >
> > Thanks,
> > Haisheng
> >
> > On 2020/04/19 10:11:45, Seliverstov Igor  wrote:
> > > Hi Haisheng,
> > >
> > > Great explanation, thanks.
> > >
> > > One thing I'd like to cover in advance is trait propagation process (I
> > need
> > > it for Apache Ignite SQL engine implementation).
> > >
> > > For example:
> > >
> > > There are two nodes: Rel X and its child node Rel Y
> > >
> > > Both nodes are in Optimized state, and there is a Logical rule for Rel Y
> > in
> > > a rules queue matched,
> > >
> > > After logical rule is applied, there is a logical rel Rel Y' and
> > containing
> > > it set moves into Exploring state (since there is a logical node which
> > has
> > > to be implemented)
> > >
> > > After whole process Exploring -> ... -> Optimized we need to force parent
> > > Rel X set to switch its state from Optimized to Optimizing to peek the
> > > newly implemented child and (possible) produce a new Rel X' physical rel
> > on
> > > the basis of previously requested from the Rel X set traits.
> > >
> > > If a new Rel X' is produced, a Rel X' parent should move its state
> > > Optimized -> Optimizing and repeat described above operations.
> > >
> > > Does it look like true?
> > >
> > > Regards,
> > > Igor
> > >
> > >
> > > вс, 19 апр. 2020 г., 6:52 Haisheng Yuan :
> > >
> > > > Hi,
> > > >
> > > > In the past few months, we have discussed a lot about Cascades style
> > > > top-down optimization, including on-demand trait derivation/request,
> > rule
> > > > apply, branch and bound space pruning. Now we think it is time to move
> > > > towards these targets.
> > > >
> > > > We will separate it into several small issues, and each one can be
> > > > integrated as a standalone, independent feature, and most importantly,
> > > > meanwhile keep backward compatibility.
> > > >
> > > > 1. Top-down trait request
> > > > In other words, pass traits requirements from parent nodes to child
> > nodes.
> > > > The trait requests happens after all the logical transformation rules
> > and
> > > > physical implementation rules are done, in a top-down manner, driven
> > from
> > > > root set. e.g.:
> > > > SELECT a, sum(c) FROM
> > > > (SELECT * FROM R JOIN S USING (a, b)) t
> > > > GROUP BY a;
> > > >
> > > > Suppose we have the following plan tree in the MEMO, and let's only
> > > > consider distribution for simplicity, each group represents a RelSet
> > in the
> > > > MEMO.
> > > >
> > > >Group 1: Agg on [a]
> > > >Group 2:   +-- MergeJoin on [a, b]
> > > >Group 3:   |--- TableScan R
> > > >Group 4:   +--- TableScan S
> > > >| Group No | Operator| Derived Traits | Required Traits |
> > > >| --- | - | --- | --- |
> > > >| Group 1  | Aggregate| Hash[a] | N/A|
> > > >| Group 2  | MergeJoin| Hash[a,b]  | Hash[a]  |
> > > >| Group 3  | TableScan R | None| Hash[a,b]   |
> > > >| Group 4  | TableScan S | None| Hash[a,b]   |
> > > >
> > > > We will add new interface PhysicalNode (or extending RelNode) with
> > > > 

[jira] [Created] (CALCITE-3938) implement visit LogicalCalc in RelShuttleImpl

2020-04-19 Thread xzh_dz (Jira)
xzh_dz created CALCITE-3938:
---

 Summary: implement visit LogicalCalc in RelShuttleImpl
 Key: CALCITE-3938
 URL: https://issues.apache.org/jira/browse/CALCITE-3938
 Project: Calcite
  Issue Type: Wish
Reporter: xzh_dz


visit LogicalCalc in RelShuttleImpl



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Towards Cascades Optimizer

2020-04-19 Thread Seliverstov Igor
Haisheng,

Ok, then such notification isn't needed

But in this case we don't have any control over how long planning takes

For some systems it's necessary to get good enough plan right now instead
of best one after while

For example we've been considering a case when a query is optimised several
times in short iterations in case it's impossible to get the best plan in
reasonable period of time (for example there is an SLA for response time)

This mean we need all needed physical implementations after each logical
transformation is applied.

Regards,
Igor


вс, 19 апр. 2020 г., 13:55 Haisheng Yuan :

> Hi Igor,
>
> There will be no rule queue anymore. Y will be fully explored (logical
> rules are matched and applied) before it can be implemented and optimized.
>
> Thanks,
> Haisheng
>
> On 2020/04/19 10:11:45, Seliverstov Igor  wrote:
> > Hi Haisheng,
> >
> > Great explanation, thanks.
> >
> > One thing I'd like to cover in advance is trait propagation process (I
> need
> > it for Apache Ignite SQL engine implementation).
> >
> > For example:
> >
> > There are two nodes: Rel X and its child node Rel Y
> >
> > Both nodes are in Optimized state, and there is a Logical rule for Rel Y
> in
> > a rules queue matched,
> >
> > After logical rule is applied, there is a logical rel Rel Y' and
> containing
> > it set moves into Exploring state (since there is a logical node which
> has
> > to be implemented)
> >
> > After whole process Exploring -> ... -> Optimized we need to force parent
> > Rel X set to switch its state from Optimized to Optimizing to peek the
> > newly implemented child and (possible) produce a new Rel X' physical rel
> on
> > the basis of previously requested from the Rel X set traits.
> >
> > If a new Rel X' is produced, a Rel X' parent should move its state
> > Optimized -> Optimizing and repeat described above operations.
> >
> > Does it look like true?
> >
> > Regards,
> > Igor
> >
> >
> > вс, 19 апр. 2020 г., 6:52 Haisheng Yuan :
> >
> > > Hi,
> > >
> > > In the past few months, we have discussed a lot about Cascades style
> > > top-down optimization, including on-demand trait derivation/request,
> rule
> > > apply, branch and bound space pruning. Now we think it is time to move
> > > towards these targets.
> > >
> > > We will separate it into several small issues, and each one can be
> > > integrated as a standalone, independent feature, and most importantly,
> > > meanwhile keep backward compatibility.
> > >
> > > 1. Top-down trait request
> > > In other words, pass traits requirements from parent nodes to child
> nodes.
> > > The trait requests happens after all the logical transformation rules
> and
> > > physical implementation rules are done, in a top-down manner, driven
> from
> > > root set. e.g.:
> > > SELECT a, sum(c) FROM
> > > (SELECT * FROM R JOIN S USING (a, b)) t
> > > GROUP BY a;
> > >
> > > Suppose we have the following plan tree in the MEMO, and let's only
> > > consider distribution for simplicity, each group represents a RelSet
> in the
> > > MEMO.
> > >
> > >Group 1: Agg on [a]
> > >Group 2:   +-- MergeJoin on [a, b]
> > >Group 3:   |--- TableScan R
> > >Group 4:   +--- TableScan S
> > >| Group No | Operator| Derived Traits | Required Traits |
> > >| --- | - | --- | --- |
> > >| Group 1  | Aggregate| Hash[a] | N/A|
> > >| Group 2  | MergeJoin| Hash[a,b]  | Hash[a]  |
> > >| Group 3  | TableScan R | None| Hash[a,b]   |
> > >| Group 4  | TableScan S | None| Hash[a,b]   |
> > >
> > > We will add new interface PhysicalNode (or extending RelNode) with
> > > methods:
> > >
> > > - Pair> requireTraits(RelTraitSet
> required);
> > >   pair.left is the current operator's new traitset, pair.right is the
> list
> > > of children's required traitset.
> > >
> > > - RelNode passThrough(RelTraitSet required);
> > >   Default implementation will call above method requireTraits() and
> > > RelNode.copy() to create new RelNode. Available to be overriden for
> > > physical operators to customize the logic.
> > >
> > > The planner will call above method on MergeJoin operator to pass the
> > > required traits (Hash[a]) to Mergejoin's child operators.
> > >
> > > We will get a new MergeJoin:
> > >  MergeJoin hash[a]
> > >   | TableScan R hash[a] (RelSubset)
> > >   + TableScan S hash[a] (RelSubset)
> > >
> > > Now the MEMO group looks like:
> > >| Group No | Operator| Derived Traits   | Required Traits
> |
> > >| -- |  - |  |
> > > - |
> > >| Group1   | Aggregate   | Hash[a] | N/A
> > > |
> > >| Group2   | MergeJoin   | Hash[a,b], Hash[a]| Hash[a]
> > > |
> > >| Group3   | TableScan R | None| Hash[a,b],
> Hash[a]
> 

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-19 Thread Haisheng Yuan
Hi Igor,

There will be no rule queue anymore. Y will be fully explored (logical rules 
are matched and applied) before it can be implemented and optimized. 

Thanks,
Haisheng

On 2020/04/19 10:11:45, Seliverstov Igor  wrote: 
> Hi Haisheng,
> 
> Great explanation, thanks.
> 
> One thing I'd like to cover in advance is trait propagation process (I need
> it for Apache Ignite SQL engine implementation).
> 
> For example:
> 
> There are two nodes: Rel X and its child node Rel Y
> 
> Both nodes are in Optimized state, and there is a Logical rule for Rel Y in
> a rules queue matched,
> 
> After logical rule is applied, there is a logical rel Rel Y' and containing
> it set moves into Exploring state (since there is a logical node which has
> to be implemented)
> 
> After whole process Exploring -> ... -> Optimized we need to force parent
> Rel X set to switch its state from Optimized to Optimizing to peek the
> newly implemented child and (possible) produce a new Rel X' physical rel on
> the basis of previously requested from the Rel X set traits.
> 
> If a new Rel X' is produced, a Rel X' parent should move its state
> Optimized -> Optimizing and repeat described above operations.
> 
> Does it look like true?
> 
> Regards,
> Igor
> 
> 
> вс, 19 апр. 2020 г., 6:52 Haisheng Yuan :
> 
> > Hi,
> >
> > In the past few months, we have discussed a lot about Cascades style
> > top-down optimization, including on-demand trait derivation/request, rule
> > apply, branch and bound space pruning. Now we think it is time to move
> > towards these targets.
> >
> > We will separate it into several small issues, and each one can be
> > integrated as a standalone, independent feature, and most importantly,
> > meanwhile keep backward compatibility.
> >
> > 1. Top-down trait request
> > In other words, pass traits requirements from parent nodes to child nodes.
> > The trait requests happens after all the logical transformation rules and
> > physical implementation rules are done, in a top-down manner, driven from
> > root set. e.g.:
> > SELECT a, sum(c) FROM
> > (SELECT * FROM R JOIN S USING (a, b)) t
> > GROUP BY a;
> >
> > Suppose we have the following plan tree in the MEMO, and let's only
> > consider distribution for simplicity, each group represents a RelSet in the
> > MEMO.
> >
> >Group 1: Agg on [a]
> >Group 2:   +-- MergeJoin on [a, b]
> >Group 3:   |--- TableScan R
> >Group 4:   +--- TableScan S
> >| Group No | Operator| Derived Traits | Required Traits |
> >| --- | - | --- | --- |
> >| Group 1  | Aggregate| Hash[a] | N/A|
> >| Group 2  | MergeJoin| Hash[a,b]  | Hash[a]  |
> >| Group 3  | TableScan R | None| Hash[a,b]   |
> >| Group 4  | TableScan S | None| Hash[a,b]   |
> >
> > We will add new interface PhysicalNode (or extending RelNode) with
> > methods:
> >
> > - Pair> requireTraits(RelTraitSet required);
> >   pair.left is the current operator's new traitset, pair.right is the list
> > of children's required traitset.
> >
> > - RelNode passThrough(RelTraitSet required);
> >   Default implementation will call above method requireTraits() and
> > RelNode.copy() to create new RelNode. Available to be overriden for
> > physical operators to customize the logic.
> >
> > The planner will call above method on MergeJoin operator to pass the
> > required traits (Hash[a]) to Mergejoin's child operators.
> >
> > We will get a new MergeJoin:
> >  MergeJoin hash[a]
> >   | TableScan R hash[a] (RelSubset)
> >   + TableScan S hash[a] (RelSubset)
> >
> > Now the MEMO group looks like:
> >| Group No | Operator| Derived Traits   | Required Traits  |
> >| -- |  - |  |
> > - |
> >| Group1   | Aggregate   | Hash[a] | N/A
> > |
> >| Group2   | MergeJoin   | Hash[a,b], Hash[a]| Hash[a]
> > |
> >| Group3   | TableScan R | None| Hash[a,b], Hash[a]
> > |
> >| Group4   | TableScan S | None| Hash[a,b], Hash[a]
> > |
> >
> > Calcite user may choose to ignore / not implement the interface to keep
> > the original behavior. Each physical operator, according to its own logic,
> > decides whether passThrough() should pass traits down or not by returning:
> > - a non-null RelNode, which means it can pass down
> > - null object, which means can't pass down
> >
> > 2. Provide option to disable AbstractConverter
> > Once the plan can request traits in top-down way in the framework, many
> > system don't need AbstractConverter anymore, since it is just a
> > intermediate operator to generate physical sort / exchange. For those, we
> > can provide option to disable AbstractConverter, generate physical enforcer
> > directly by adding a method to interface 

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-19 Thread Seliverstov Igor
Hi Haisheng,

Great explanation, thanks.

One thing I'd like to cover in advance is trait propagation process (I need
it for Apache Ignite SQL engine implementation).

For example:

There are two nodes: Rel X and its child node Rel Y

Both nodes are in Optimized state, and there is a Logical rule for Rel Y in
a rules queue matched,

After logical rule is applied, there is a logical rel Rel Y' and containing
it set moves into Exploring state (since there is a logical node which has
to be implemented)

After whole process Exploring -> ... -> Optimized we need to force parent
Rel X set to switch its state from Optimized to Optimizing to peek the
newly implemented child and (possible) produce a new Rel X' physical rel on
the basis of previously requested from the Rel X set traits.

If a new Rel X' is produced, a Rel X' parent should move its state
Optimized -> Optimizing and repeat described above operations.

Does it look like true?

Regards,
Igor


вс, 19 апр. 2020 г., 6:52 Haisheng Yuan :

> Hi,
>
> In the past few months, we have discussed a lot about Cascades style
> top-down optimization, including on-demand trait derivation/request, rule
> apply, branch and bound space pruning. Now we think it is time to move
> towards these targets.
>
> We will separate it into several small issues, and each one can be
> integrated as a standalone, independent feature, and most importantly,
> meanwhile keep backward compatibility.
>
> 1. Top-down trait request
> In other words, pass traits requirements from parent nodes to child nodes.
> The trait requests happens after all the logical transformation rules and
> physical implementation rules are done, in a top-down manner, driven from
> root set. e.g.:
> SELECT a, sum(c) FROM
> (SELECT * FROM R JOIN S USING (a, b)) t
> GROUP BY a;
>
> Suppose we have the following plan tree in the MEMO, and let's only
> consider distribution for simplicity, each group represents a RelSet in the
> MEMO.
>
>Group 1: Agg on [a]
>Group 2:   +-- MergeJoin on [a, b]
>Group 3:   |--- TableScan R
>Group 4:   +--- TableScan S
>| Group No | Operator| Derived Traits | Required Traits |
>| --- | - | --- | --- |
>| Group 1  | Aggregate| Hash[a] | N/A|
>| Group 2  | MergeJoin| Hash[a,b]  | Hash[a]  |
>| Group 3  | TableScan R | None| Hash[a,b]   |
>| Group 4  | TableScan S | None| Hash[a,b]   |
>
> We will add new interface PhysicalNode (or extending RelNode) with
> methods:
>
> - Pair> requireTraits(RelTraitSet required);
>   pair.left is the current operator's new traitset, pair.right is the list
> of children's required traitset.
>
> - RelNode passThrough(RelTraitSet required);
>   Default implementation will call above method requireTraits() and
> RelNode.copy() to create new RelNode. Available to be overriden for
> physical operators to customize the logic.
>
> The planner will call above method on MergeJoin operator to pass the
> required traits (Hash[a]) to Mergejoin's child operators.
>
> We will get a new MergeJoin:
>  MergeJoin hash[a]
>   | TableScan R hash[a] (RelSubset)
>   + TableScan S hash[a] (RelSubset)
>
> Now the MEMO group looks like:
>| Group No | Operator| Derived Traits   | Required Traits  |
>| -- |  - |  |
> - |
>| Group1   | Aggregate   | Hash[a] | N/A
> |
>| Group2   | MergeJoin   | Hash[a,b], Hash[a]| Hash[a]
> |
>| Group3   | TableScan R | None| Hash[a,b], Hash[a]
> |
>| Group4   | TableScan S | None| Hash[a,b], Hash[a]
> |
>
> Calcite user may choose to ignore / not implement the interface to keep
> the original behavior. Each physical operator, according to its own logic,
> decides whether passThrough() should pass traits down or not by returning:
> - a non-null RelNode, which means it can pass down
> - null object, which means can't pass down
>
> 2. Provide option to disable AbstractConverter
> Once the plan can request traits in top-down way in the framework, many
> system don't need AbstractConverter anymore, since it is just a
> intermediate operator to generate physical sort / exchange. For those, we
> can provide option to disable AbstractConverter, generate physical enforcer
> directly by adding a method to interface Convention:
> - RelNode enforce(RelNode node, RelTraitSet traits);
>
> The default implementation may just calling changeTraitsUsingConverters(),
> but people may choose to override it if the system has special needs, like
> several traits must implement together, or the position of collation in
> RelTraitSet is before distribution.
>
> 3. Top-down driven, on-demand rule apply
> For every RelNode in a RelSet, rule is matched and applied sequentially,
>