Re: Query optimization time with Calcite

2023-08-23 Thread Roman Kondakov

Hi Abhijit,

there are actually a lot of places where time is spent. I think the most 
time consuming places are jvm lazy class loading and code generation for 
metadata framework.


Anyway, on a long distance the planning time will be very small.

Thanks.

--

Roman

On 24.08.2023 00:51, Abhijit Subramanya wrote:

Thanks a lot for your help.
Would you happen to have any insights into where exactly the time is spent
during the warm up?

On Tue, Aug 22, 2023 at 11:58 PM Roman Kondakov 
wrote:


Hi Abhijit,

Usually the very first query is optimized a bit long in Calcite. But
after some warm-up the next queries will be planned faster and faster.

If you use Calcite as a part of a server application, then the problem
with a long query planning will be eliminated after a few query runs.

Thanks.

--

Roman

On 23.08.2023 01:33, Abhijit Subramanya wrote:

Hi,

I am currently working with Apache Calcite and using the
VolcanoPlanner to optimize queries. I've noticed that the

planner.setRoot()

function can take approximately 100ms to complete. After some profiling,

it

appears that the JaninoRelMetadataProvider.compile function might be the
bottleneck.


  Is this level of optimization time expected for a simple query like:


SELECT * FROM table1

LEFT JOIN table2 ON table1.a = table2.a

WHERE d >= '2023-08-01'

LIMIT 1000;


I'm running this on my MacBook, which has macOS Ventura, 16GB of RAM, and
10 CPU cores. I am using the latest version of calcite from the github

repo.


Are there specific settings or configurations I should be aware of for
optimizing query compilation times in Calcite?


Thanks



Functions that cache state, e.g. compiled regular expressions

2023-08-23 Thread Julian Hyde
Can I get one or two reviews of
https://github.com/apache/calcite/pull/3394 /
https://issues.apache.org/jira/browse/CALCITE-5914 ?

I'm trying to introduce an architecture so that Java functions that
need to retain state for performance reasons (e.g. a cached compiled
regular expression) are straightforward to write and use.

My proposal is to implement such functions using a non-static method.
The code generator instantiates the "function object" so that it can
call the method, uses the same object throughout the query, and the
function can store state in that object. We were already doing it with
the RAND function, so it made sense to use the mechanism more widely.

Julian


Re: [jira] [Created] (CALCITE-5950) Default column constraint is erroneously processed.

2023-08-23 Thread Julian Hyde
I just took a look at
https://github.com/apache/calcite/pull/3393/checks. I see an autostyle
violation, and it looks valid. That error is there so that you can fix
it and not waste reviewers' time.

On Wed, Aug 23, 2023 at 2:15 PM Julian Hyde  wrote:
>
> There aren't many false positives or flaky tests in CI,
> checker-framework, error-prone. If it says that (say) you are passing
> a nullable object to a method that requires a not-nullable argument,
> you probably are.
>
> I find it useful to run checker-framework on my own computer, rather
> than waiting for CI. But you must use JDK 11:
>
>  ./gradlew --no-parallel --no-daemon -PenableCheckerframework
> :linq4j:classes :core:classes
>
> Julian
>
> On Wed, Aug 23, 2023 at 8:42 AM stanilovsky evgeny
>  wrote:
> >
> > PR is ready for review, plz explain did i need to fix all of: CI /
> > CheckerFramework issues ?
> > Seems some of them are false positive.
> > thanks !
> >
> > > Evgeny Stanilovsky created CALCITE-5950:


Re: [jira] [Created] (CALCITE-5950) Default column constraint is erroneously processed.

2023-08-23 Thread Julian Hyde
There aren't many false positives or flaky tests in CI,
checker-framework, error-prone. If it says that (say) you are passing
a nullable object to a method that requires a not-nullable argument,
you probably are.

I find it useful to run checker-framework on my own computer, rather
than waiting for CI. But you must use JDK 11:

 ./gradlew --no-parallel --no-daemon -PenableCheckerframework
:linq4j:classes :core:classes

Julian

On Wed, Aug 23, 2023 at 8:42 AM stanilovsky evgeny
 wrote:
>
> PR is ready for review, plz explain did i need to fix all of: CI /
> CheckerFramework issues ?
> Seems some of them are false positive.
> thanks !
>
> > Evgeny Stanilovsky created CALCITE-5950:


Re: Query optimization time with Calcite

2023-08-23 Thread Abhijit Subramanya
Thanks a lot for your help.
Would you happen to have any insights into where exactly the time is spent
during the warm up?

On Tue, Aug 22, 2023 at 11:58 PM Roman Kondakov 
wrote:

> Hi Abhijit,
>
> Usually the very first query is optimized a bit long in Calcite. But
> after some warm-up the next queries will be planned faster and faster.
>
> If you use Calcite as a part of a server application, then the problem
> with a long query planning will be eliminated after a few query runs.
>
> Thanks.
>
> --
>
> Roman
>
> On 23.08.2023 01:33, Abhijit Subramanya wrote:
> > Hi,
> >
> >I am currently working with Apache Calcite and using the
> > VolcanoPlanner to optimize queries. I've noticed that the
> planner.setRoot()
> > function can take approximately 100ms to complete. After some profiling,
> it
> > appears that the JaninoRelMetadataProvider.compile function might be the
> > bottleneck.
> >
> >
> >  Is this level of optimization time expected for a simple query like:
> >
> >
> > SELECT * FROM table1
> >
> > LEFT JOIN table2 ON table1.a = table2.a
> >
> > WHERE d >= '2023-08-01'
> >
> > LIMIT 1000;
> >
> >
> > I'm running this on my MacBook, which has macOS Ventura, 16GB of RAM, and
> > 10 CPU cores. I am using the latest version of calcite from the github
> repo.
> >
> >
> > Are there specific settings or configurations I should be aware of for
> > optimizing query compilation times in Calcite?
> >
> >
> > Thanks
> >
>


[jira] [Created] (CALCITE-5953) AggregateCaseToFilterRule may make inaccurate SUM transformation

2023-08-23 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created CALCITE-5953:
-

 Summary: AggregateCaseToFilterRule may make inaccurate SUM 
transformation
 Key: CALCITE-5953
 URL: https://issues.apache.org/jira/browse/CALCITE-5953
 Project: Calcite
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


consider: `sum(case when x = 1 then 2 else 0 end) as b`
notice that this expression may only be null if there are no rows in the table

`AggregateCaseToFilterRule` rewrites the above expression to `sum(1) filter 
(where x=2)` which broadens when it could be `null` to when there are no 
matches to the filter

* `A` is `0` correctly in this case; but I think it will be still `0` in case 
there are 0 input rows
* The result for `B` supposed to be `0` but since there are no matches by the 
filter it becomes `null`
* `C` is not touched

```
# Convert CASE to FILTER without matches
select  sum(case when x = 1 then 1 else 0 end) as a,
sum(case when x = 1 then 2 else 0 end) as b,
sum(case when x = 1 then 3 else -1 end) as c
from (values 0, null, 0, 2) as t(x);
+---+---++
| A | B | C  |
+---+---++
| 0 | 0 | -4 |
+---+---++
(1 row)

!ok
EnumerableCalc(expr#0..2=[{inputs}], expr#3=[CAST($t0):INTEGER], A=[$t3], 
B=[$t1], C=[$t2])
  EnumerableAggregate(group=[{}], A=[COUNT() FILTER $1], B=[SUM($2) FILTER $3], 
C=[SUM($0)])
EnumerableCalc(expr#0=[{inputs}], expr#1=[1], expr#2=[=($t0, $t1)], 
expr#3=[3], expr#4=[-1], expr#5=[CASE($t2, $t3, $t4)], expr#6=[IS TRUE($t2)], 
expr#7=[2], $f2=[$t5], $f3=[$t6], $f4=[$t7], $f5=[$t6])
  EnumerableValues(tuples=[[{ 0 }, { null }, { 0 }, { 2 }]])
!plan
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [jira] [Created] (CALCITE-5950) Default column constraint is erroneously processed.

2023-08-23 Thread stanilovsky evgeny
PR is ready for review, plz explain did i need to fix all of: CI /  
CheckerFramework issues ?

Seems some of them are false positive.
thanks !


Evgeny Stanilovsky created CALCITE-5950:


[jira] [Created] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule

2023-08-23 Thread Leonid Chistov (Jira)
Leonid Chistov created CALCITE-5952:
---

 Summary: Semi-Join incorrectly reordered with Left-Join by 
SemiJoinJoinTransposeRule
 Key: CALCITE-5952
 URL: https://issues.apache.org/jira/browse/CALCITE-5952
 Project: Calcite
  Issue Type: Bug
Affects Versions: 1.35.0
Reporter: Leonid Chistov
Assignee: Leonid Chistov


The following test will fail if added to RelOptRulesTest.java
{code:java}
@Test void testCanNotPushSemiJoinToRightJoinBranch() {
  final Function relFn = b -> b
  .scan("EMP")
  .scan("DEPT")
  .join(JoinRelType.LEFT,
  b.equals(
  b.field(2, 0, "DEPTNO"),
  b.field(2, 1, "DEPTNO"))
  )
  .scan("BONUS")
  .semiJoin(b.equals(
  b.field(2, 0, "DNAME"),
  b.field(2, 1, "JOB")))
  .build();
  relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
} {code}
Produced plan will look like:
{code:java}
LogicalJoin(condition=[=($7, $8)], joinType=[left])
  LogicalTableScan(table=[[scott, EMP]])
  LogicalJoin(condition=[=($1, $4)], joinType=[semi])
    LogicalTableScan(table=[[scott, DEPT]])
    LogicalTableScan(table=[[scott, BONUS]]) {code}
Which is different from the original plan:
{code:java}
LogicalJoin(condition=[=($9, $12)], joinType=[semi])
  LogicalJoin(condition=[=($7, $8)], joinType=[left])
    LogicalTableScan(table=[[scott, EMP]])
    LogicalTableScan(table=[[scott, DEPT]])
  LogicalTableScan(table=[[scott, BONUS]]) {code}
This is not correct - in general case it is not correct to push semi-join to 
right side of left-join.

The reason is the following:

Consider rows from `EMP` that have no matching rows in DEPT. These rows will 
have `nulls` for `DEPT` columns in the result of left-join and they will be 
rejected by the top semi-join.

But if we push semi-join to RHS of left-join, we are going to see rows from 
`EMP` with `nulls` on the `DEPT` side in the final result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Query optimization time with Calcite

2023-08-23 Thread Roman Kondakov

Hi Abhijit,

Usually the very first query is optimized a bit long in Calcite. But 
after some warm-up the next queries will be planned faster and faster.


If you use Calcite as a part of a server application, then the problem 
with a long query planning will be eliminated after a few query runs.


Thanks.

--

Roman

On 23.08.2023 01:33, Abhijit Subramanya wrote:

Hi,

   I am currently working with Apache Calcite and using the
VolcanoPlanner to optimize queries. I've noticed that the planner.setRoot()
function can take approximately 100ms to complete. After some profiling, it
appears that the JaninoRelMetadataProvider.compile function might be the
bottleneck.


 Is this level of optimization time expected for a simple query like:


SELECT * FROM table1

LEFT JOIN table2 ON table1.a = table2.a

WHERE d >= '2023-08-01'

LIMIT 1000;


I'm running this on my MacBook, which has macOS Ventura, 16GB of RAM, and
10 CPU cores. I am using the latest version of calcite from the github repo.


Are there specific settings or configurations I should be aware of for
optimizing query compilation times in Calcite?


Thanks