Re: Query optimization time with Calcite
Hi Abhijit, there are actually a lot of places where time is spent. I think the most time consuming places are jvm lazy class loading and code generation for metadata framework. Anyway, on a long distance the planning time will be very small. Thanks. -- Roman On 24.08.2023 00:51, Abhijit Subramanya wrote: Thanks a lot for your help. Would you happen to have any insights into where exactly the time is spent during the warm up? On Tue, Aug 22, 2023 at 11:58 PM Roman Kondakov wrote: Hi Abhijit, Usually the very first query is optimized a bit long in Calcite. But after some warm-up the next queries will be planned faster and faster. If you use Calcite as a part of a server application, then the problem with a long query planning will be eliminated after a few query runs. Thanks. -- Roman On 23.08.2023 01:33, Abhijit Subramanya wrote: Hi, I am currently working with Apache Calcite and using the VolcanoPlanner to optimize queries. I've noticed that the planner.setRoot() function can take approximately 100ms to complete. After some profiling, it appears that the JaninoRelMetadataProvider.compile function might be the bottleneck. Is this level of optimization time expected for a simple query like: SELECT * FROM table1 LEFT JOIN table2 ON table1.a = table2.a WHERE d >= '2023-08-01' LIMIT 1000; I'm running this on my MacBook, which has macOS Ventura, 16GB of RAM, and 10 CPU cores. I am using the latest version of calcite from the github repo. Are there specific settings or configurations I should be aware of for optimizing query compilation times in Calcite? Thanks
Functions that cache state, e.g. compiled regular expressions
Can I get one or two reviews of https://github.com/apache/calcite/pull/3394 / https://issues.apache.org/jira/browse/CALCITE-5914 ? I'm trying to introduce an architecture so that Java functions that need to retain state for performance reasons (e.g. a cached compiled regular expression) are straightforward to write and use. My proposal is to implement such functions using a non-static method. The code generator instantiates the "function object" so that it can call the method, uses the same object throughout the query, and the function can store state in that object. We were already doing it with the RAND function, so it made sense to use the mechanism more widely. Julian
Re: [jira] [Created] (CALCITE-5950) Default column constraint is erroneously processed.
I just took a look at https://github.com/apache/calcite/pull/3393/checks. I see an autostyle violation, and it looks valid. That error is there so that you can fix it and not waste reviewers' time. On Wed, Aug 23, 2023 at 2:15 PM Julian Hyde wrote: > > There aren't many false positives or flaky tests in CI, > checker-framework, error-prone. If it says that (say) you are passing > a nullable object to a method that requires a not-nullable argument, > you probably are. > > I find it useful to run checker-framework on my own computer, rather > than waiting for CI. But you must use JDK 11: > > ./gradlew --no-parallel --no-daemon -PenableCheckerframework > :linq4j:classes :core:classes > > Julian > > On Wed, Aug 23, 2023 at 8:42 AM stanilovsky evgeny > wrote: > > > > PR is ready for review, plz explain did i need to fix all of: CI / > > CheckerFramework issues ? > > Seems some of them are false positive. > > thanks ! > > > > > Evgeny Stanilovsky created CALCITE-5950:
Re: [jira] [Created] (CALCITE-5950) Default column constraint is erroneously processed.
There aren't many false positives or flaky tests in CI, checker-framework, error-prone. If it says that (say) you are passing a nullable object to a method that requires a not-nullable argument, you probably are. I find it useful to run checker-framework on my own computer, rather than waiting for CI. But you must use JDK 11: ./gradlew --no-parallel --no-daemon -PenableCheckerframework :linq4j:classes :core:classes Julian On Wed, Aug 23, 2023 at 8:42 AM stanilovsky evgeny wrote: > > PR is ready for review, plz explain did i need to fix all of: CI / > CheckerFramework issues ? > Seems some of them are false positive. > thanks ! > > > Evgeny Stanilovsky created CALCITE-5950:
Re: Query optimization time with Calcite
Thanks a lot for your help. Would you happen to have any insights into where exactly the time is spent during the warm up? On Tue, Aug 22, 2023 at 11:58 PM Roman Kondakov wrote: > Hi Abhijit, > > Usually the very first query is optimized a bit long in Calcite. But > after some warm-up the next queries will be planned faster and faster. > > If you use Calcite as a part of a server application, then the problem > with a long query planning will be eliminated after a few query runs. > > Thanks. > > -- > > Roman > > On 23.08.2023 01:33, Abhijit Subramanya wrote: > > Hi, > > > >I am currently working with Apache Calcite and using the > > VolcanoPlanner to optimize queries. I've noticed that the > planner.setRoot() > > function can take approximately 100ms to complete. After some profiling, > it > > appears that the JaninoRelMetadataProvider.compile function might be the > > bottleneck. > > > > > > Is this level of optimization time expected for a simple query like: > > > > > > SELECT * FROM table1 > > > > LEFT JOIN table2 ON table1.a = table2.a > > > > WHERE d >= '2023-08-01' > > > > LIMIT 1000; > > > > > > I'm running this on my MacBook, which has macOS Ventura, 16GB of RAM, and > > 10 CPU cores. I am using the latest version of calcite from the github > repo. > > > > > > Are there specific settings or configurations I should be aware of for > > optimizing query compilation times in Calcite? > > > > > > Thanks > > >
[jira] [Created] (CALCITE-5953) AggregateCaseToFilterRule may make inaccurate SUM transformation
Zoltan Haindrich created CALCITE-5953: - Summary: AggregateCaseToFilterRule may make inaccurate SUM transformation Key: CALCITE-5953 URL: https://issues.apache.org/jira/browse/CALCITE-5953 Project: Calcite Issue Type: Bug Reporter: Zoltan Haindrich Assignee: Zoltan Haindrich consider: `sum(case when x = 1 then 2 else 0 end) as b` notice that this expression may only be null if there are no rows in the table `AggregateCaseToFilterRule` rewrites the above expression to `sum(1) filter (where x=2)` which broadens when it could be `null` to when there are no matches to the filter * `A` is `0` correctly in this case; but I think it will be still `0` in case there are 0 input rows * The result for `B` supposed to be `0` but since there are no matches by the filter it becomes `null` * `C` is not touched ``` # Convert CASE to FILTER without matches select sum(case when x = 1 then 1 else 0 end) as a, sum(case when x = 1 then 2 else 0 end) as b, sum(case when x = 1 then 3 else -1 end) as c from (values 0, null, 0, 2) as t(x); +---+---++ | A | B | C | +---+---++ | 0 | 0 | -4 | +---+---++ (1 row) !ok EnumerableCalc(expr#0..2=[{inputs}], expr#3=[CAST($t0):INTEGER], A=[$t3], B=[$t1], C=[$t2]) EnumerableAggregate(group=[{}], A=[COUNT() FILTER $1], B=[SUM($2) FILTER $3], C=[SUM($0)]) EnumerableCalc(expr#0=[{inputs}], expr#1=[1], expr#2=[=($t0, $t1)], expr#3=[3], expr#4=[-1], expr#5=[CASE($t2, $t3, $t4)], expr#6=[IS TRUE($t2)], expr#7=[2], $f2=[$t5], $f3=[$t6], $f4=[$t7], $f5=[$t6]) EnumerableValues(tuples=[[{ 0 }, { null }, { 0 }, { 2 }]]) !plan ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [jira] [Created] (CALCITE-5950) Default column constraint is erroneously processed.
PR is ready for review, plz explain did i need to fix all of: CI / CheckerFramework issues ? Seems some of them are false positive. thanks ! Evgeny Stanilovsky created CALCITE-5950:
[jira] [Created] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
Leonid Chistov created CALCITE-5952: --- Summary: Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule Key: CALCITE-5952 URL: https://issues.apache.org/jira/browse/CALCITE-5952 Project: Calcite Issue Type: Bug Affects Versions: 1.35.0 Reporter: Leonid Chistov Assignee: Leonid Chistov The following test will fail if added to RelOptRulesTest.java {code:java} @Test void testCanNotPushSemiJoinToRightJoinBranch() { final Function relFn = b -> b .scan("EMP") .scan("DEPT") .join(JoinRelType.LEFT, b.equals( b.field(2, 0, "DEPTNO"), b.field(2, 1, "DEPTNO")) ) .scan("BONUS") .semiJoin(b.equals( b.field(2, 0, "DNAME"), b.field(2, 1, "JOB"))) .build(); relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); } {code} Produced plan will look like: {code:java} LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalJoin(condition=[=($1, $4)], joinType=[semi]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} Which is different from the original plan: {code:java} LogicalJoin(condition=[=($9, $12)], joinType=[semi]) LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} This is not correct - in general case it is not correct to push semi-join to right side of left-join. The reason is the following: Consider rows from `EMP` that have no matching rows in DEPT. These rows will have `nulls` for `DEPT` columns in the result of left-join and they will be rejected by the top semi-join. But if we push semi-join to RHS of left-join, we are going to see rows from `EMP` with `nulls` on the `DEPT` side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: Query optimization time with Calcite
Hi Abhijit, Usually the very first query is optimized a bit long in Calcite. But after some warm-up the next queries will be planned faster and faster. If you use Calcite as a part of a server application, then the problem with a long query planning will be eliminated after a few query runs. Thanks. -- Roman On 23.08.2023 01:33, Abhijit Subramanya wrote: Hi, I am currently working with Apache Calcite and using the VolcanoPlanner to optimize queries. I've noticed that the planner.setRoot() function can take approximately 100ms to complete. After some profiling, it appears that the JaninoRelMetadataProvider.compile function might be the bottleneck. Is this level of optimization time expected for a simple query like: SELECT * FROM table1 LEFT JOIN table2 ON table1.a = table2.a WHERE d >= '2023-08-01' LIMIT 1000; I'm running this on my MacBook, which has macOS Ventura, 16GB of RAM, and 10 CPU cores. I am using the latest version of calcite from the github repo. Are there specific settings or configurations I should be aware of for optimizing query compilation times in Calcite? Thanks