Hi Vladimir Thanks for your reply: 1. I run TpchTest#testQuery07 in my MPB, the avg time is 43s, and after re-run the PR ut, the ut is passed, so this ut timeout may be an occasional situation.
2. For the pull/2222, it due increase the time of TpchTest#testQuery07, about 23%, sorry for that, we test the feature in our product env(Calcite is our SQL facade in ByteDance, it process hundreds of thousands SQL per day), and we have monitor about optimization time, no obvious changes were found. Without pull/2222, some cases will fail due to "There are not enough rules to produce a node with desired properties: There is 1 empty subset", the root cause is the outdated rule matches have been fired. Actually, I found actually the time was taken by RelSubset#getRelList, a fresh case I found yesterday, see https://github.com/apache/calcite/pull/2283, it OOM in VolcanoRuleCall.matchRecurse when calling getRelList(but the root clause must be the deadlock between rules), so I think the pull/2222's fix is not a problem, the true problem is in "getRelList", my proposal is we can store the relList in RelSubSet like RelSet#rels. 3. For the case TpchTest#testQuery07, after optimization, the largest rel ID is increased to *900k*, and has *20k* rels remain in volcano planner, so it's pretty large. In fact, after profile, I found most of the time is taken by ProjectMergeRule, in our company's error cases(OOM, Stack Overflow) are also related to this rule, maybe we can see if we can optimize this. Finally, to summarize: 1. TpchTest#testQuery07 timeout is an occasional situation, normally it ends about 40~45s(in my MBP15 2019). 2. pull/2222 itself is a positive fix, it due increase the time but the true problem is RelSubset#getRelList, we can improve pull/2222 by improving RelSubset#getRelList, this not only benefits this, it makes getRelList a cheap function as its name, we can call this with no burden 3. Calcite sometime will generate huge alternate rels, it actually not reasonable most of the time, we've found many OOM/Stack Overflow/timeout, but the SQL is highly private, If you are interested, I can share the cases' root clause using blog or email, but I may have no time to construct these cases. Let's workout together to decrease them. Regards! Aron Tao Vladimir Sitnikov <sitnikov.vladi...@gmail.com> 于2020年12月2日周三 下午10:01写道: > Hi, > > I've made a quick check, and the results are as follows: > 1) The test was enabled recently (2020-10-20) in > > https://github.com/apache/calcite/commit/b5a761e559ca1c1c914e388df4c6a0958dc17fc8 > 2) One of the recent changes that do significantly impact the performance > of the test is https://github.com/apache/calcite/pull/2222 > 3) It is sad we commit performance-related fixes without taking performance > into consideration :-( > > Do you think you could look into 2222 again and check if the slowdown was > expected? > Do you think you could improve 2222? > Do you think we should revert 2222? > > Vladimir >