Re: [discuss] Add the default mode to the path in the Json functions.
Thank you for proposing this, Forward. I am wondering whether it is really useful for users who want to use these functions since they probably know what the syntax is. But personally I would like to have the default mode since it is more convenient. Best, Chunwei On Sun, Apr 19, 2020 at 9:51 AM Forward Xu wrote: > hi everyone, I recently found some discussable optimizations when I > contributed the flip-90 [1] Json functions: > The current json functions in calcite: JSON_EXISTS, JSON_VALUE, JSON_QUERY, > JSON_OBJECT, JSON_OBJECTAGG, JSON_ARRAY, JSON_ARRAYAGG and IS JSON > predication functions. These functions are implemented based on the SQL > 2016-2017 standard [2]. According to this standard, the path of the json > function must be used in one of strict or lax mode. such as: > json_exists ('{"foo": "bar"}', 'lax $ .foo') or > json_exists ('{"foo": "bar"}', 'strict $ .foo') > Can we give a default mode to simplify the use of lax and strict. For > example, we default to lax mode. In this way, the use of our json function > can be simplified to: > json_exists ('{"foo": "bar"}', '$ .foo') > Implementation idea improvement JsonFunctions jsonApiCommonSyntax path > judgment to increase the default lax mode logic. > Of course, these changes are not described in SQL2016-2017. > I want to hear your opinion here. > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=141724550 > [2] > > https://standards.iso.org/ittf/PubliclyAvailableStandards/c067367_ISO_IEC_TR_19075-6_2017.zip >
[jira] [Created] (CALCITE-3940) Hint item can not parse correctly if the name is right after token "/*+"
Danny Chen created CALCITE-3940: --- Summary: Hint item can not parse correctly if the name is right after token "/*+" Key: CALCITE-3940 URL: https://issues.apache.org/jira/browse/CALCITE-3940 Project: Calcite Issue Type: Bug Components: core Affects Versions: 1.22.0 Reporter: Danny Chen Assignee: Danny Chen Fix For: 1.23.0 After parsing /*+OPTIONS('k1'='v1', 'k2'='v2')*/, the parsed hint name is "PTIONS", while it is expected to be "OPTIONS". -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Towards Cascades Optimizer
Hi Igor, Your comment - "because actual cost may be calculated correctly using physical operators only. So won't be able to implement Branch and Bound Space Pruning.“ is actually not true. In Cascade’s lower bound / upper bound pruning algorithm, you can get cost lower bound of input RelNode using cardinality * unit_copy_cost. The unit_copy_cost is a constant, which stands for the minimal cost for processing a tuple from the input. So this will be the minimal cost the input RelNode can achieve, and if this is indeed larger than the current best cost, this input node can be pruned. In this proposal, we are not trying to split logical and physical planning entirely. But for any given equivalent set, we would need to finish exploration before implementation. But for the entire memo tree, each set could be in different planning stage, and an equivalent set can be pruned even before it’s implemented. A text book example of aforementioned “lower bound / upper bound pruning” would be the join ordering case. Regarding #3, I think we can still achieve that (partially) through this proposal. Remember every time when we start the optimization, we pass down an upper bound limit. Initially this upper bound for root is infinite - meaning that no feasible plan is available. Every time when we find a physical plan we update this upper bound, then start the search again. We could stop the search when the cost is less than a pre-defined threshold - which gives you a “good enough” plan with early termination. Still this wouldn't avoid the logical exploration. For that, you would probably archive through rule configurations, and avoid some expansive transformation to keep the cost down. > On Apr 19, 2020, at 7:30 PM, Haisheng Yuan wrote: > > Igor, > > a) Given current Calcite's stats derivation strategy, mixing logical and > physical planning won't make it better. I hope you went through my email to > the end, currently operators inside a memo group don't share stats info, each > operator's stats may differ with the other one, and the RelSubset's best node > and stats may vary during optimization. So logical and physical rule pruning > are not safe at the moment, otherwise it almost implies changing downstream > project's cost model silently. > > On the other hand, ensuring child group exploration task finish first will > make rule mutual exclusivity check possible, like the result of > ReduceExpressionRule won't need trigger the same rule again, The join result > of JoinCommuteRule won't need trigger JoinCommuteRule and > ReduceExpressionRule again. > > More importantly, most if not all the the long planning queries in Calcite > are not caused by too many alternatives, but mutual triggering, or cyclic > triggering, which can be avoided by customizing the rules instead of using > the default one. Unless you use dynamic programming (Calcite use heuristic) > on join reordering (left-deep, right-deep, bushy), space pruning won't help > make long / infinite running query faster. > > b) No evidence shows current version of Calcite will return the most > promising plan in first planning iteration. Instead of praying for getting > good enough plan in the first iteration, why not focus on fixing rules that > causes the issue? > > c) That is not the goal. > > On 2020/04/19 15:14:57, Seliverstov Igor wrote: >> Haisheng, >> >> From my point of view splitting logical and physical planning parts isn’t a >> good idea. >> >> I think so because actual cost may be calculated correctly using physical >> operators only. So that we: >> a) won't be able to implement Branch and Bound Space Pruning (as far as I >> understand, at exploring time there are no physical operators, no correct >> costs, but assumptions only, I don’t think we should rely on them) >> b) won’t be able to get good enough plan (without Branch and Bound Space >> Pruning it’s non-trivial task to get right search direction and most >> promising plans in first planning iterations) >> c) won’t be able to tune the good enough plan during several similar queries >> are executed >> >> Regards, >> Igor >> >> >>> 19 апр. 2020 г., в 17:37, Haisheng Yuan написал(а): >>> >>> Hi Igor, >>> >>> You can't have your cake and eat it. >>> But one feasible work item definitely we can do is that once timeout, stop >>> exploring, use the first available physical operator in each group and >>> optimize it. >>> >>> Because most, if not all, of the long / infinite running optimizations are >>> caused by project related transpose, join reordering (join commute + >>> associative), constant reduction for large expression (see CALCITE-3460), >>> all of which are logical transformations rules and many of which have >>> corresponding JIRAs. So another alternative is, let's fix these bugs to >>> improve Calcite to make timeout option less usable. >>> >>> Another thing worth noting is that sql server optimizer timeout mechanism
Re: The Travis CI build failure
According to my observations, many failures are caused by SocketTimeOut exceptions. So maybe we need to add timeout & retry logic for such test cases. Best, Liya Fan On Mon, Apr 20, 2020 at 11:13 AM Francis Chuang wrote: > Does the Github Actions tests always pass when this happens? > > I wonder if this is due to a network issue or a corrupt gradle cache on > the travis nodes. > > See: https://github.com/facebook/react-native/issues/21130 > > On 20/04/2020 12:58 pm, XING JIN wrote: > > I have force pushed several times, but it keeps failing. > > > > Chunwei Lei 于2020年4月20日周一 上午10:49写道: > > > >> I am a little confused too. Sometimes the traivs ci build of master > fails > >> when I merge > >> the pr which passes all tests. > >> > >> > >> > >> Best, > >> Chunwei > >> > >> > >> On Mon, Apr 20, 2020 at 10:33 AM XING JIN > wrote: > >> > >>> In PR https://github.com/apache/calcite/pull/1926 and > >>> https://github.com/apache/calcite/pull/1925, The Travis CI build keeps > >>> failing with below messages. The exception has nothing to do with the > >>> change in PR. There seems to be something wrong with the building > >>> environment. > >>> > >>> ... > >>> > >>> > >> > /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:42: > >>> error: cannot access java.lang > >>> class HttpServer { > >>> ^ > >>>zip END header not found > >>> > >>> > >> > /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:43: > >>> error: cannot find symbol > >>>private static String localIpAddress; > >>> ^ > >>>symbol: class String > >>>location: class HttpServer > >>> > >>> > >> > /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:45: > >>> error: cannot find symbol > >>>private final File resourceBase; > >>> ^ > >>>symbol: class File > >>>location: class HttpServer > >>> ... > >>> > >>> Jin > >>> > >> > > >
Re: The Travis CI build failure
Does the Github Actions tests always pass when this happens? I wonder if this is due to a network issue or a corrupt gradle cache on the travis nodes. See: https://github.com/facebook/react-native/issues/21130 On 20/04/2020 12:58 pm, XING JIN wrote: I have force pushed several times, but it keeps failing. Chunwei Lei 于2020年4月20日周一 上午10:49写道: I am a little confused too. Sometimes the traivs ci build of master fails when I merge the pr which passes all tests. Best, Chunwei On Mon, Apr 20, 2020 at 10:33 AM XING JIN wrote: In PR https://github.com/apache/calcite/pull/1926 and https://github.com/apache/calcite/pull/1925, The Travis CI build keeps failing with below messages. The exception has nothing to do with the change in PR. There seems to be something wrong with the building environment. ... /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:42: error: cannot access java.lang class HttpServer { ^ zip END header not found /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:43: error: cannot find symbol private static String localIpAddress; ^ symbol: class String location: class HttpServer /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:45: error: cannot find symbol private final File resourceBase; ^ symbol: class File location: class HttpServer ... Jin
Re: The Travis CI build failure
I have force pushed several times, but it keeps failing. Chunwei Lei 于2020年4月20日周一 上午10:49写道: > I am a little confused too. Sometimes the traivs ci build of master fails > when I merge > the pr which passes all tests. > > > > Best, > Chunwei > > > On Mon, Apr 20, 2020 at 10:33 AM XING JIN wrote: > > > In PR https://github.com/apache/calcite/pull/1926 and > > https://github.com/apache/calcite/pull/1925, The Travis CI build keeps > > failing with below messages. The exception has nothing to do with the > > change in PR. There seems to be something wrong with the building > > environment. > > > > ... > > > > > /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:42: > > error: cannot access java.lang > > class HttpServer { > > ^ > > zip END header not found > > > > > /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:43: > > error: cannot find symbol > > private static String localIpAddress; > > ^ > > symbol: class String > > location: class HttpServer > > > > > /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:45: > > error: cannot find symbol > > private final File resourceBase; > > ^ > > symbol: class File > > location: class HttpServer > > ... > > > > Jin > > >
Re: The Travis CI build failure
I am a little confused too. Sometimes the traivs ci build of master fails when I merge the pr which passes all tests. Best, Chunwei On Mon, Apr 20, 2020 at 10:33 AM XING JIN wrote: > In PR https://github.com/apache/calcite/pull/1926 and > https://github.com/apache/calcite/pull/1925, The Travis CI build keeps > failing with below messages. The exception has nothing to do with the > change in PR. There seems to be something wrong with the building > environment. > > ... > > /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:42: > error: cannot access java.lang > class HttpServer { > ^ > zip END header not found > > /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:43: > error: cannot find symbol > private static String localIpAddress; > ^ > symbol: class String > location: class HttpServer > > /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:45: > error: cannot find symbol > private final File resourceBase; > ^ > symbol: class File > location: class HttpServer > ... > > Jin >
The Travis CI build failure
In PR https://github.com/apache/calcite/pull/1926 and https://github.com/apache/calcite/pull/1925, The Travis CI build keeps failing with below messages. The exception has nothing to do with the change in PR. There seems to be something wrong with the building environment. ... /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:42: error: cannot access java.lang class HttpServer { ^ zip END header not found /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:43: error: cannot find symbol private static String localIpAddress; ^ symbol: class String location: class HttpServer /home/travis/build/apache/calcite/spark/src/main/java/org/apache/calcite/adapter/spark/HttpServer.java:45: error: cannot find symbol private final File resourceBase; ^ symbol: class File location: class HttpServer ... Jin
Re: [DISCUSS] Towards Cascades Optimizer
Igor, a) Given current Calcite's stats derivation strategy, mixing logical and physical planning won't make it better. I hope you went through my email to the end, currently operators inside a memo group don't share stats info, each operator's stats may differ with the other one, and the RelSubset's best node and stats may vary during optimization. So logical and physical rule pruning are not safe at the moment, otherwise it almost implies changing downstream project's cost model silently. On the other hand, ensuring child group exploration task finish first will make rule mutual exclusivity check possible, like the result of ReduceExpressionRule won't need trigger the same rule again, The join result of JoinCommuteRule won't need trigger JoinCommuteRule and ReduceExpressionRule again. More importantly, most if not all the the long planning queries in Calcite are not caused by too many alternatives, but mutual triggering, or cyclic triggering, which can be avoided by customizing the rules instead of using the default one. Unless you use dynamic programming (Calcite use heuristic) on join reordering (left-deep, right-deep, bushy), space pruning won't help make long / infinite running query faster. b) No evidence shows current version of Calcite will return the most promising plan in first planning iteration. Instead of praying for getting good enough plan in the first iteration, why not focus on fixing rules that causes the issue? c) That is not the goal. On 2020/04/19 15:14:57, Seliverstov Igor wrote: > Haisheng, > > From my point of view splitting logical and physical planning parts isn’t a > good idea. > > I think so because actual cost may be calculated correctly using physical > operators only. So that we: > a) won't be able to implement Branch and Bound Space Pruning (as far as I > understand, at exploring time there are no physical operators, no correct > costs, but assumptions only, I don’t think we should rely on them) > b) won’t be able to get good enough plan (without Branch and Bound Space > Pruning it’s non-trivial task to get right search direction and most > promising plans in first planning iterations) > c) won’t be able to tune the good enough plan during several similar queries > are executed > > Regards, > Igor > > > > 19 апр. 2020 г., в 17:37, Haisheng Yuan написал(а): > > > > Hi Igor, > > > > You can't have your cake and eat it. > > But one feasible work item definitely we can do is that once timeout, stop > > exploring, use the first available physical operator in each group and > > optimize it. > > > > Because most, if not all, of the long / infinite running optimizations are > > caused by project related transpose, join reordering (join commute + > > associative), constant reduction for large expression (see CALCITE-3460), > > all of which are logical transformations rules and many of which have > > corresponding JIRAs. So another alternative is, let's fix these bugs to > > improve Calcite to make timeout option less usable. > > > > Another thing worth noting is that sql server optimizer timeout mechanism > > is not based on clock time, but the number of optimization tasks it has > > done [1]. > > > > [1] > > https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188 > > > > Regards, > > Haisheng > > > > On 2020/04/19 11:31:27, Seliverstov Igor wrote: > >> Haisheng, > >> > >> Ok, then such notification isn't needed > >> > >> But in this case we don't have any control over how long planning takes > >> > >> For some systems it's necessary to get good enough plan right now instead > >> of best one after while > >> > >> For example we've been considering a case when a query is optimised several > >> times in short iterations in case it's impossible to get the best plan in > >> reasonable period of time (for example there is an SLA for response time) > >> > >> This mean we need all needed physical implementations after each logical > >> transformation is applied. > >> > >> Regards, > >> Igor > >> > >> > >> вс, 19 апр. 2020 г., 13:55 Haisheng Yuan : > >> > >>> Hi Igor, > >>> > >>> There will be no rule queue anymore. Y will be fully explored (logical > >>> rules are matched and applied) before it can be implemented and optimized. > >>> > >>> Thanks, > >>> Haisheng > >>> > >>> On 2020/04/19 10:11:45, Seliverstov Igor wrote: > Hi Haisheng, > > Great explanation, thanks. > > One thing I'd like to cover in advance is trait propagation process (I > >>> need > it for Apache Ignite SQL engine implementation). > > For example: > > There are two nodes: Rel X and its child node Rel Y > > Both nodes are in Optimized state, and there is a Logical rule for Rel Y > >>> in > a rules queue matched, > > After logical rule is applied, there is a logical rel Rel Y' and > >>>
[jira] [Created] (CALCITE-3939) more auto pruning rules after SubstitutionRule is introduced
Botong Huang created CALCITE-3939: - Summary: more auto pruning rules after SubstitutionRule is introduced Key: CALCITE-3939 URL: https://issues.apache.org/jira/browse/CALCITE-3939 Project: Calcite Issue Type: Improvement Reporter: Botong Huang UnionEliminatorRule and ProjectRemoveRule are both pruning rules for a RelNode. They can also become SubstitutionRule with autoprune enabled -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Towards Calcite-Avatica 1.17.0
Hey everyone, I am planning to make rc0 available for voting towards the end of April / start of May. How does this look in terms of timing? Francis On 12/04/2020 2:51 am, Stamatis Zampetakis wrote: Thanks again for taking the lead on this Francis! Personally, I am quite busy these days but will do my best to check 1-2 PRs. On Fri, Apr 10, 2020 at 2:57 PM Josh Elser wrote: Always a good idea. I'll add this to my list and see if I can help get any committed. It's been a while since I've looked at the list. On 4/8/20 7:59 PM, Francis Chuang wrote: The last avatica release was in December last year. From activity on our mailing lists and the Calcite repository, it feels like there's currently a lull and things aren't as active. Would this be a good opportunity to work on Avatica and push a release out? There are currently 10 open PRs for Avatica [1] and I think if they are reviewed and merged, it wold be possible to close a huge chunk of them. I am happy to be RM for this release, but will need help from the community to review and merge those PRs. Francis [1] https://github.com/apache/calcite-avatica/pulls
Stored Proc to Relational Expression
Hi Team, I have my use where I need to convert my dialect specific stored procedure constructs like while loop, If then else to Rel expression Basically this can contain control flow statements like below DECLARE heads BOOL; DECLARE heads_count INT64 DEFAULT 0; LOOP SET heads = RAND() < 0.5; IF heads THEN SELECT 'Heads!'; SET heads_count = heads_count + 1; ELSE SELECT 'Tails!'; BREAK; END IF; END LOOP; SELECT CONCAT(CAST(heads_count AS STRING), ' heads in a row'); I can create a Java AST model from the linq4j provided by calcite however this is only going to generate Java Result and I believe its only used by the calcite for relational expressions of enumerable calling convention which is used by adapters which does not support core relational operations right? Is there a way I can convert the stored proc constructs into some canonical form like Rel Tree and back to Stored proc of target dialect. -- Thanks, Ravi
Re: [DISCUSS] Towards Cascades Optimizer
Haisheng, From my point of view splitting logical and physical planning parts isn’t a good idea. I think so because actual cost may be calculated correctly using physical operators only. So that we: a) won't be able to implement Branch and Bound Space Pruning (as far as I understand, at exploring time there are no physical operators, no correct costs, but assumptions only, I don’t think we should rely on them) b) won’t be able to get good enough plan (without Branch and Bound Space Pruning it’s non-trivial task to get right search direction and most promising plans in first planning iterations) c) won’t be able to tune the good enough plan during several similar queries are executed Regards, Igor > 19 апр. 2020 г., в 17:37, Haisheng Yuan написал(а): > > Hi Igor, > > You can't have your cake and eat it. > But one feasible work item definitely we can do is that once timeout, stop > exploring, use the first available physical operator in each group and > optimize it. > > Because most, if not all, of the long / infinite running optimizations are > caused by project related transpose, join reordering (join commute + > associative), constant reduction for large expression (see CALCITE-3460), all > of which are logical transformations rules and many of which have > corresponding JIRAs. So another alternative is, let's fix these bugs to > improve Calcite to make timeout option less usable. > > Another thing worth noting is that sql server optimizer timeout mechanism is > not based on clock time, but the number of optimization tasks it has done [1]. > > [1] > https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188 > > Regards, > Haisheng > > On 2020/04/19 11:31:27, Seliverstov Igor wrote: >> Haisheng, >> >> Ok, then such notification isn't needed >> >> But in this case we don't have any control over how long planning takes >> >> For some systems it's necessary to get good enough plan right now instead >> of best one after while >> >> For example we've been considering a case when a query is optimised several >> times in short iterations in case it's impossible to get the best plan in >> reasonable period of time (for example there is an SLA for response time) >> >> This mean we need all needed physical implementations after each logical >> transformation is applied. >> >> Regards, >> Igor >> >> >> вс, 19 апр. 2020 г., 13:55 Haisheng Yuan : >> >>> Hi Igor, >>> >>> There will be no rule queue anymore. Y will be fully explored (logical >>> rules are matched and applied) before it can be implemented and optimized. >>> >>> Thanks, >>> Haisheng >>> >>> On 2020/04/19 10:11:45, Seliverstov Igor wrote: Hi Haisheng, Great explanation, thanks. One thing I'd like to cover in advance is trait propagation process (I >>> need it for Apache Ignite SQL engine implementation). For example: There are two nodes: Rel X and its child node Rel Y Both nodes are in Optimized state, and there is a Logical rule for Rel Y >>> in a rules queue matched, After logical rule is applied, there is a logical rel Rel Y' and >>> containing it set moves into Exploring state (since there is a logical node which >>> has to be implemented) After whole process Exploring -> ... -> Optimized we need to force parent Rel X set to switch its state from Optimized to Optimizing to peek the newly implemented child and (possible) produce a new Rel X' physical rel >>> on the basis of previously requested from the Rel X set traits. If a new Rel X' is produced, a Rel X' parent should move its state Optimized -> Optimizing and repeat described above operations. Does it look like true? Regards, Igor вс, 19 апр. 2020 г., 6:52 Haisheng Yuan : > Hi, > > In the past few months, we have discussed a lot about Cascades style > top-down optimization, including on-demand trait derivation/request, >>> rule > apply, branch and bound space pruning. Now we think it is time to move > towards these targets. > > We will separate it into several small issues, and each one can be > integrated as a standalone, independent feature, and most importantly, > meanwhile keep backward compatibility. > > 1. Top-down trait request > In other words, pass traits requirements from parent nodes to child >>> nodes. > The trait requests happens after all the logical transformation rules >>> and > physical implementation rules are done, in a top-down manner, driven >>> from > root set. e.g.: > SELECT a, sum(c) FROM >(SELECT * FROM R JOIN S USING (a, b)) t > GROUP BY a; > > Suppose we have the following plan tree in the MEMO, and let's only > consider distribution for simplicity, each group represents a
Re: [DISCUSS] Towards Cascades Optimizer
Hi Igor, You can't have your cake and eat it. But one feasible work item definitely we can do is that once timeout, stop exploring, use the first available physical operator in each group and optimize it. Because most, if not all, of the long / infinite running optimizations are caused by project related transpose, join reordering (join commute + associative), constant reduction for large expression (see CALCITE-3460), all of which are logical transformations rules and many of which have corresponding JIRAs. So another alternative is, let's fix these bugs to improve Calcite to make timeout option less usable. Another thing worth noting is that sql server optimizer timeout mechanism is not based on clock time, but the number of optimization tasks it has done [1]. [1] https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188 Regards, Haisheng On 2020/04/19 11:31:27, Seliverstov Igor wrote: > Haisheng, > > Ok, then such notification isn't needed > > But in this case we don't have any control over how long planning takes > > For some systems it's necessary to get good enough plan right now instead > of best one after while > > For example we've been considering a case when a query is optimised several > times in short iterations in case it's impossible to get the best plan in > reasonable period of time (for example there is an SLA for response time) > > This mean we need all needed physical implementations after each logical > transformation is applied. > > Regards, > Igor > > > вс, 19 апр. 2020 г., 13:55 Haisheng Yuan : > > > Hi Igor, > > > > There will be no rule queue anymore. Y will be fully explored (logical > > rules are matched and applied) before it can be implemented and optimized. > > > > Thanks, > > Haisheng > > > > On 2020/04/19 10:11:45, Seliverstov Igor wrote: > > > Hi Haisheng, > > > > > > Great explanation, thanks. > > > > > > One thing I'd like to cover in advance is trait propagation process (I > > need > > > it for Apache Ignite SQL engine implementation). > > > > > > For example: > > > > > > There are two nodes: Rel X and its child node Rel Y > > > > > > Both nodes are in Optimized state, and there is a Logical rule for Rel Y > > in > > > a rules queue matched, > > > > > > After logical rule is applied, there is a logical rel Rel Y' and > > containing > > > it set moves into Exploring state (since there is a logical node which > > has > > > to be implemented) > > > > > > After whole process Exploring -> ... -> Optimized we need to force parent > > > Rel X set to switch its state from Optimized to Optimizing to peek the > > > newly implemented child and (possible) produce a new Rel X' physical rel > > on > > > the basis of previously requested from the Rel X set traits. > > > > > > If a new Rel X' is produced, a Rel X' parent should move its state > > > Optimized -> Optimizing and repeat described above operations. > > > > > > Does it look like true? > > > > > > Regards, > > > Igor > > > > > > > > > вс, 19 апр. 2020 г., 6:52 Haisheng Yuan : > > > > > > > Hi, > > > > > > > > In the past few months, we have discussed a lot about Cascades style > > > > top-down optimization, including on-demand trait derivation/request, > > rule > > > > apply, branch and bound space pruning. Now we think it is time to move > > > > towards these targets. > > > > > > > > We will separate it into several small issues, and each one can be > > > > integrated as a standalone, independent feature, and most importantly, > > > > meanwhile keep backward compatibility. > > > > > > > > 1. Top-down trait request > > > > In other words, pass traits requirements from parent nodes to child > > nodes. > > > > The trait requests happens after all the logical transformation rules > > and > > > > physical implementation rules are done, in a top-down manner, driven > > from > > > > root set. e.g.: > > > > SELECT a, sum(c) FROM > > > > (SELECT * FROM R JOIN S USING (a, b)) t > > > > GROUP BY a; > > > > > > > > Suppose we have the following plan tree in the MEMO, and let's only > > > > consider distribution for simplicity, each group represents a RelSet > > in the > > > > MEMO. > > > > > > > >Group 1: Agg on [a] > > > >Group 2: +-- MergeJoin on [a, b] > > > >Group 3: |--- TableScan R > > > >Group 4: +--- TableScan S > > > >| Group No | Operator| Derived Traits | Required Traits | > > > >| --- | - | --- | --- | > > > >| Group 1 | Aggregate| Hash[a] | N/A| > > > >| Group 2 | MergeJoin| Hash[a,b] | Hash[a] | > > > >| Group 3 | TableScan R | None| Hash[a,b] | > > > >| Group 4 | TableScan S | None| Hash[a,b] | > > > > > > > > We will add new interface PhysicalNode (or extending RelNode) with > > > >
[jira] [Created] (CALCITE-3938) implement visit LogicalCalc in RelShuttleImpl
xzh_dz created CALCITE-3938: --- Summary: implement visit LogicalCalc in RelShuttleImpl Key: CALCITE-3938 URL: https://issues.apache.org/jira/browse/CALCITE-3938 Project: Calcite Issue Type: Wish Reporter: xzh_dz visit LogicalCalc in RelShuttleImpl -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Towards Cascades Optimizer
Haisheng, Ok, then such notification isn't needed But in this case we don't have any control over how long planning takes For some systems it's necessary to get good enough plan right now instead of best one after while For example we've been considering a case when a query is optimised several times in short iterations in case it's impossible to get the best plan in reasonable period of time (for example there is an SLA for response time) This mean we need all needed physical implementations after each logical transformation is applied. Regards, Igor вс, 19 апр. 2020 г., 13:55 Haisheng Yuan : > Hi Igor, > > There will be no rule queue anymore. Y will be fully explored (logical > rules are matched and applied) before it can be implemented and optimized. > > Thanks, > Haisheng > > On 2020/04/19 10:11:45, Seliverstov Igor wrote: > > Hi Haisheng, > > > > Great explanation, thanks. > > > > One thing I'd like to cover in advance is trait propagation process (I > need > > it for Apache Ignite SQL engine implementation). > > > > For example: > > > > There are two nodes: Rel X and its child node Rel Y > > > > Both nodes are in Optimized state, and there is a Logical rule for Rel Y > in > > a rules queue matched, > > > > After logical rule is applied, there is a logical rel Rel Y' and > containing > > it set moves into Exploring state (since there is a logical node which > has > > to be implemented) > > > > After whole process Exploring -> ... -> Optimized we need to force parent > > Rel X set to switch its state from Optimized to Optimizing to peek the > > newly implemented child and (possible) produce a new Rel X' physical rel > on > > the basis of previously requested from the Rel X set traits. > > > > If a new Rel X' is produced, a Rel X' parent should move its state > > Optimized -> Optimizing and repeat described above operations. > > > > Does it look like true? > > > > Regards, > > Igor > > > > > > вс, 19 апр. 2020 г., 6:52 Haisheng Yuan : > > > > > Hi, > > > > > > In the past few months, we have discussed a lot about Cascades style > > > top-down optimization, including on-demand trait derivation/request, > rule > > > apply, branch and bound space pruning. Now we think it is time to move > > > towards these targets. > > > > > > We will separate it into several small issues, and each one can be > > > integrated as a standalone, independent feature, and most importantly, > > > meanwhile keep backward compatibility. > > > > > > 1. Top-down trait request > > > In other words, pass traits requirements from parent nodes to child > nodes. > > > The trait requests happens after all the logical transformation rules > and > > > physical implementation rules are done, in a top-down manner, driven > from > > > root set. e.g.: > > > SELECT a, sum(c) FROM > > > (SELECT * FROM R JOIN S USING (a, b)) t > > > GROUP BY a; > > > > > > Suppose we have the following plan tree in the MEMO, and let's only > > > consider distribution for simplicity, each group represents a RelSet > in the > > > MEMO. > > > > > >Group 1: Agg on [a] > > >Group 2: +-- MergeJoin on [a, b] > > >Group 3: |--- TableScan R > > >Group 4: +--- TableScan S > > >| Group No | Operator| Derived Traits | Required Traits | > > >| --- | - | --- | --- | > > >| Group 1 | Aggregate| Hash[a] | N/A| > > >| Group 2 | MergeJoin| Hash[a,b] | Hash[a] | > > >| Group 3 | TableScan R | None| Hash[a,b] | > > >| Group 4 | TableScan S | None| Hash[a,b] | > > > > > > We will add new interface PhysicalNode (or extending RelNode) with > > > methods: > > > > > > - Pair> requireTraits(RelTraitSet > required); > > > pair.left is the current operator's new traitset, pair.right is the > list > > > of children's required traitset. > > > > > > - RelNode passThrough(RelTraitSet required); > > > Default implementation will call above method requireTraits() and > > > RelNode.copy() to create new RelNode. Available to be overriden for > > > physical operators to customize the logic. > > > > > > The planner will call above method on MergeJoin operator to pass the > > > required traits (Hash[a]) to Mergejoin's child operators. > > > > > > We will get a new MergeJoin: > > > MergeJoin hash[a] > > > | TableScan R hash[a] (RelSubset) > > > + TableScan S hash[a] (RelSubset) > > > > > > Now the MEMO group looks like: > > >| Group No | Operator| Derived Traits | Required Traits > | > > >| -- | - | | > > > - | > > >| Group1 | Aggregate | Hash[a] | N/A > > > | > > >| Group2 | MergeJoin | Hash[a,b], Hash[a]| Hash[a] > > > | > > >| Group3 | TableScan R | None| Hash[a,b], > Hash[a] >
Re: [DISCUSS] Towards Cascades Optimizer
Hi Igor, There will be no rule queue anymore. Y will be fully explored (logical rules are matched and applied) before it can be implemented and optimized. Thanks, Haisheng On 2020/04/19 10:11:45, Seliverstov Igor wrote: > Hi Haisheng, > > Great explanation, thanks. > > One thing I'd like to cover in advance is trait propagation process (I need > it for Apache Ignite SQL engine implementation). > > For example: > > There are two nodes: Rel X and its child node Rel Y > > Both nodes are in Optimized state, and there is a Logical rule for Rel Y in > a rules queue matched, > > After logical rule is applied, there is a logical rel Rel Y' and containing > it set moves into Exploring state (since there is a logical node which has > to be implemented) > > After whole process Exploring -> ... -> Optimized we need to force parent > Rel X set to switch its state from Optimized to Optimizing to peek the > newly implemented child and (possible) produce a new Rel X' physical rel on > the basis of previously requested from the Rel X set traits. > > If a new Rel X' is produced, a Rel X' parent should move its state > Optimized -> Optimizing and repeat described above operations. > > Does it look like true? > > Regards, > Igor > > > вс, 19 апр. 2020 г., 6:52 Haisheng Yuan : > > > Hi, > > > > In the past few months, we have discussed a lot about Cascades style > > top-down optimization, including on-demand trait derivation/request, rule > > apply, branch and bound space pruning. Now we think it is time to move > > towards these targets. > > > > We will separate it into several small issues, and each one can be > > integrated as a standalone, independent feature, and most importantly, > > meanwhile keep backward compatibility. > > > > 1. Top-down trait request > > In other words, pass traits requirements from parent nodes to child nodes. > > The trait requests happens after all the logical transformation rules and > > physical implementation rules are done, in a top-down manner, driven from > > root set. e.g.: > > SELECT a, sum(c) FROM > > (SELECT * FROM R JOIN S USING (a, b)) t > > GROUP BY a; > > > > Suppose we have the following plan tree in the MEMO, and let's only > > consider distribution for simplicity, each group represents a RelSet in the > > MEMO. > > > >Group 1: Agg on [a] > >Group 2: +-- MergeJoin on [a, b] > >Group 3: |--- TableScan R > >Group 4: +--- TableScan S > >| Group No | Operator| Derived Traits | Required Traits | > >| --- | - | --- | --- | > >| Group 1 | Aggregate| Hash[a] | N/A| > >| Group 2 | MergeJoin| Hash[a,b] | Hash[a] | > >| Group 3 | TableScan R | None| Hash[a,b] | > >| Group 4 | TableScan S | None| Hash[a,b] | > > > > We will add new interface PhysicalNode (or extending RelNode) with > > methods: > > > > - Pair> requireTraits(RelTraitSet required); > > pair.left is the current operator's new traitset, pair.right is the list > > of children's required traitset. > > > > - RelNode passThrough(RelTraitSet required); > > Default implementation will call above method requireTraits() and > > RelNode.copy() to create new RelNode. Available to be overriden for > > physical operators to customize the logic. > > > > The planner will call above method on MergeJoin operator to pass the > > required traits (Hash[a]) to Mergejoin's child operators. > > > > We will get a new MergeJoin: > > MergeJoin hash[a] > > | TableScan R hash[a] (RelSubset) > > + TableScan S hash[a] (RelSubset) > > > > Now the MEMO group looks like: > >| Group No | Operator| Derived Traits | Required Traits | > >| -- | - | | > > - | > >| Group1 | Aggregate | Hash[a] | N/A > > | > >| Group2 | MergeJoin | Hash[a,b], Hash[a]| Hash[a] > > | > >| Group3 | TableScan R | None| Hash[a,b], Hash[a] > > | > >| Group4 | TableScan S | None| Hash[a,b], Hash[a] > > | > > > > Calcite user may choose to ignore / not implement the interface to keep > > the original behavior. Each physical operator, according to its own logic, > > decides whether passThrough() should pass traits down or not by returning: > > - a non-null RelNode, which means it can pass down > > - null object, which means can't pass down > > > > 2. Provide option to disable AbstractConverter > > Once the plan can request traits in top-down way in the framework, many > > system don't need AbstractConverter anymore, since it is just a > > intermediate operator to generate physical sort / exchange. For those, we > > can provide option to disable AbstractConverter, generate physical enforcer > > directly by adding a method to interface
Re: [DISCUSS] Towards Cascades Optimizer
Hi Haisheng, Great explanation, thanks. One thing I'd like to cover in advance is trait propagation process (I need it for Apache Ignite SQL engine implementation). For example: There are two nodes: Rel X and its child node Rel Y Both nodes are in Optimized state, and there is a Logical rule for Rel Y in a rules queue matched, After logical rule is applied, there is a logical rel Rel Y' and containing it set moves into Exploring state (since there is a logical node which has to be implemented) After whole process Exploring -> ... -> Optimized we need to force parent Rel X set to switch its state from Optimized to Optimizing to peek the newly implemented child and (possible) produce a new Rel X' physical rel on the basis of previously requested from the Rel X set traits. If a new Rel X' is produced, a Rel X' parent should move its state Optimized -> Optimizing and repeat described above operations. Does it look like true? Regards, Igor вс, 19 апр. 2020 г., 6:52 Haisheng Yuan : > Hi, > > In the past few months, we have discussed a lot about Cascades style > top-down optimization, including on-demand trait derivation/request, rule > apply, branch and bound space pruning. Now we think it is time to move > towards these targets. > > We will separate it into several small issues, and each one can be > integrated as a standalone, independent feature, and most importantly, > meanwhile keep backward compatibility. > > 1. Top-down trait request > In other words, pass traits requirements from parent nodes to child nodes. > The trait requests happens after all the logical transformation rules and > physical implementation rules are done, in a top-down manner, driven from > root set. e.g.: > SELECT a, sum(c) FROM > (SELECT * FROM R JOIN S USING (a, b)) t > GROUP BY a; > > Suppose we have the following plan tree in the MEMO, and let's only > consider distribution for simplicity, each group represents a RelSet in the > MEMO. > >Group 1: Agg on [a] >Group 2: +-- MergeJoin on [a, b] >Group 3: |--- TableScan R >Group 4: +--- TableScan S >| Group No | Operator| Derived Traits | Required Traits | >| --- | - | --- | --- | >| Group 1 | Aggregate| Hash[a] | N/A| >| Group 2 | MergeJoin| Hash[a,b] | Hash[a] | >| Group 3 | TableScan R | None| Hash[a,b] | >| Group 4 | TableScan S | None| Hash[a,b] | > > We will add new interface PhysicalNode (or extending RelNode) with > methods: > > - Pair> requireTraits(RelTraitSet required); > pair.left is the current operator's new traitset, pair.right is the list > of children's required traitset. > > - RelNode passThrough(RelTraitSet required); > Default implementation will call above method requireTraits() and > RelNode.copy() to create new RelNode. Available to be overriden for > physical operators to customize the logic. > > The planner will call above method on MergeJoin operator to pass the > required traits (Hash[a]) to Mergejoin's child operators. > > We will get a new MergeJoin: > MergeJoin hash[a] > | TableScan R hash[a] (RelSubset) > + TableScan S hash[a] (RelSubset) > > Now the MEMO group looks like: >| Group No | Operator| Derived Traits | Required Traits | >| -- | - | | > - | >| Group1 | Aggregate | Hash[a] | N/A > | >| Group2 | MergeJoin | Hash[a,b], Hash[a]| Hash[a] > | >| Group3 | TableScan R | None| Hash[a,b], Hash[a] > | >| Group4 | TableScan S | None| Hash[a,b], Hash[a] > | > > Calcite user may choose to ignore / not implement the interface to keep > the original behavior. Each physical operator, according to its own logic, > decides whether passThrough() should pass traits down or not by returning: > - a non-null RelNode, which means it can pass down > - null object, which means can't pass down > > 2. Provide option to disable AbstractConverter > Once the plan can request traits in top-down way in the framework, many > system don't need AbstractConverter anymore, since it is just a > intermediate operator to generate physical sort / exchange. For those, we > can provide option to disable AbstractConverter, generate physical enforcer > directly by adding a method to interface Convention: > - RelNode enforce(RelNode node, RelTraitSet traits); > > The default implementation may just calling changeTraitsUsingConverters(), > but people may choose to override it if the system has special needs, like > several traits must implement together, or the position of collation in > RelTraitSet is before distribution. > > 3. Top-down driven, on-demand rule apply > For every RelNode in a RelSet, rule is matched and applied sequentially, >