Re: Review request.

2024-02-20 Thread Hanumath Maduri
Thanks Julian for looking into this. I have updated the JIRA with
corresponding details.

On Mon, Feb 19, 2024 at 8:27 PM Julian Hyde  wrote:

> Please respond to my comments in Jira. It's difficult to review a PR
> until we know what problem it is fixing.
>
> On Mon, Feb 19, 2024 at 5:53 PM Hanumath Maduri
>  wrote:
> >
> > Hello Developers,
> >
> > I've had a pull request (https://github.com/apache/calcite/pull/3640)
> open
> > for review over the past few weeks.
> > I want to extend my gratitude to Mihai and Ruben for their thorough
> review
> > and valuable feedback. I've addressed all the comments provided. This fix
> > is crucial for our SQL engine, and I'm eager to include it in the
> upcoming
> > release.
> >
> > If the pull request appears satisfactory, your approvals would greatly
> help
> > in its inclusion in the next release.
> >
> > Thank you in advance.
> >
> > Regards,
> > Hanu
>


Re: Review request.

2024-02-19 Thread Julian Hyde
Please respond to my comments in Jira. It's difficult to review a PR
until we know what problem it is fixing.

On Mon, Feb 19, 2024 at 5:53 PM Hanumath Maduri
 wrote:
>
> Hello Developers,
>
> I've had a pull request (https://github.com/apache/calcite/pull/3640) open
> for review over the past few weeks.
> I want to extend my gratitude to Mihai and Ruben for their thorough review
> and valuable feedback. I've addressed all the comments provided. This fix
> is crucial for our SQL engine, and I'm eager to include it in the upcoming
> release.
>
> If the pull request appears satisfactory, your approvals would greatly help
> in its inclusion in the next release.
>
> Thank you in advance.
>
> Regards,
> Hanu


Re: [REVIEW REQUEST] CALCITE-5160 and CALCITE-5403

2023-03-06 Thread Dan Zou
I would like to participate in the review of these two PRs

Best,
Dan Zou   





> 2023年3月6日 23:03,Dmitry Sysolyatin  写道:
> 
> Hello,
> I would like to request review for the following tickets:
> 
> CALCITE-5160: ANY/SOME, ALL operators should support collection expressions
> [1]
> CALCITE-5403: PostgreSQL dialect should support SET, RESET, BEGIN, SHOW,
> ROLLBACK, COMMIT [2]
> 
> [1] https://issues.apache.org/jira/browse/CALCITE-5160
> [2] https://issues.apache.org/jira/browse/CALCITE-5403
> 
> Thanks!



Re: [REVIEW REQUEST] CALCITE-5253 NATURAL JOIN erroneous validation.

2022-09-30 Thread Alessandro Solimando
Hi Evgeny,
thanks for submitting the patch, I have taken a look and left
some comments, but I think that the PR is overall in good shape.

Best regards,
Alessandro

On Fri, 30 Sept 2022 at 09:22, stanilovsky evgeny <
estanilovs...@gridgain.com> wrote:

> Hi,
>
> I would like to request a review for CALCITE-5253, pr [1], it fixes
> partially erroneous natural join validaton logic.
> Thanks !
>
> [1] https://github.com/apache/calcite/pull/2889
> [2] https://issues.apache.org/jira/browse/CALCITE-5253
>


Re: Review request: Babel parser doesn't parse IF(condition, then, else) statements

2022-08-05 Thread Benchao Li
Jiajun, thanks for your contribution, I'll review this.

Jiajun Xie  于2022年8月5日周五 10:55写道:

>   `IF(condition, then, else)` is a common expression, babel parser failed
> because CALCITE-3946 put `IF` into the keywords list.
>
>   I added it to reservedFunctionNames, here is PR:
> https://github.com/apache/calcite/pull/2835
>


-- 

Best,
Benchao Li


Re: Review Request: CALCITE-5064

2022-03-28 Thread Stamatis Zampetakis
I will have another look at CALCITE-4992 in the following days.

Best,
Stamatis

On Mon, Mar 28, 2022 at 7:45 AM James Turton  wrote:

> Excuse me freeloading on this thread but another PR that originated from
> our end (Drill) and is once again ready for review is CALCITE-4992
> (https://github.com/apache/calcite/pull/2698).
>
> Thanks
> James
>
> On 2022/03/27 16:20, Charles Givre wrote:
> > Hello Calcite Devs,
> > I'd like to request a review for CALCITE-5064
> > (https://github.com/apache/calcite/pull/2752) which is a minor bug fix
> > that prevents the correct dialect from being selected for BigQuery.
> > Thanks!
> > -- C
> >
>
>


Re: Review Request: CALCITE-5064

2022-03-27 Thread James Turton
Excuse me freeloading on this thread but another PR that originated from 
our end (Drill) and is once again ready for review is CALCITE-4992 
(https://github.com/apache/calcite/pull/2698).


Thanks
James

On 2022/03/27 16:20, Charles Givre wrote:

Hello Calcite Devs,
I'd like to request a review for CALCITE-5064 
(https://github.com/apache/calcite/pull/2752) which is a minor bug fix 
that prevents the correct dialect from being selected for BigQuery.

Thanks!
-- C





Re: [REVIEW REQUEST]: Review Request for CALCITE-4987

2022-01-26 Thread Yanjing Wang
Hi community,

This is a continuous review request for CALCITE-4987, I think it's helpful
for generating valid ORDER BY clause on various engine, and federated
queries will benefit from it, hence I'm honestly hoping this change will be
merged soon.

Yanjing Wang  于2022年1月18日周二 19:56写道:

> Hi community,
> I'd like to request a review on CALCITE-4987(
> https://github.com/apache/calcite/pull/2693), which also
> fixes CALCITE-2552.
>  After this fix, the RelToSqlConverter will generate ORDER BY clause by
> dialect's conformance, and prefer ORDER BY ordinal rather alias when both
> is supported.
> This fix will avoid two issues:
> 1. Generate ORDER BY fields syntax that Some dialects don't support like
> Hive.
> 2. Resolved to be ORDER BY nested aggregate expression when ORDER BY
> aggregate expression argument collides with SELECT ITEM alias.
> such as *SELECT sum(shelf_width) as shelf_width FROM product ORDER BY
> sum(shelf_width)  *will be resolved to *SELECT sum(shelf_width) as
> shelf_width FROM product ORDER BY sum(sum(shelf_width))*
> .
>


Re: Review request: CALCITE-4818 (AggregateExpandDistinctAggregatesRule must infer correct data type for top aggregate calls)

2021-10-27 Thread Taras Ledkov

Hi Calcite team,

Gently reminder about review / merge the patch for  CALCITE-4818 [1]

[1]. https://issues.apache.org/jira/browse/CALCITE-4818

On 15.10.2021 15:16, Taras Ledkov wrote:

Hi,

Gently reminder about review.

On 04.10.2021 20:30, Taras Ledkov wrote:

Hi,

Please review the patch for the issue CALCITE-4818 [1], see PR#2560 [2].

Looks like the rule 'AggregateExpandDistinctAggregatesRule' contains 
another bug with inferring result type of the top aggregate calls.


e.g.

If the type system expand sum type like postgress:
SUM(TINYINT | SMALLINT | INTEGER) ->  BIGINT
SUM(BIGINT) -> DECIMAL

The query:
SELECT SUM(i), SUM(DISTINCT i) FROM TBL;
where i - INTEGER field.

produces the plan:
LogicalProject(EXPR$0=[CAST($0):BIGINT], EXPR$1=[$1])
 LogicalAggregate(group=[{}], EXPR$0=[SUM($1)], EXPR$1=[SUM($0)]) 
-->  RowType[DECIMAL, BIGINT]
    LogicalAggregate(group=[{0}], 
EXPR$0=[SUM($0)]) --> 
RowType[INTEGER, BIGINT]

  LogicalProject(COMM=[$6])
    LogicalTableScan(table=[[CATALOG, SALES, EMP]])

But the rule ignores the change row type in the bottom aggregate.

I propose the simple fix: pass 'null' type to the 
'AggregateCall.create' call to infer aggregate type from the input.


[1]. https://issues.apache.org/jira/browse/CALCITE-4818
[2]. https://github.com/apache/calcite/pull/2560


--
Taras Ledkov
Mail-To: tled...@gridgain.com



Re: Review request: CALCITE-4818 (AggregateExpandDistinctAggregatesRule must infer correct data type for top aggregate calls)

2021-10-15 Thread Taras Ledkov

Hi,

Gently reminder about review.

On 04.10.2021 20:30, Taras Ledkov wrote:

Hi,

Please review the patch for the issue CALCITE-4818 [1], see PR#2560 [2].

Looks like the rule 'AggregateExpandDistinctAggregatesRule' contains 
another bug with inferring result type of the top aggregate calls.


e.g.

If the type system expand sum type like postgress:
SUM(TINYINT | SMALLINT | INTEGER) ->  BIGINT
SUM(BIGINT) -> DECIMAL

The query:
SELECT SUM(i), SUM(DISTINCT i) FROM TBL;
where i - INTEGER field.

produces the plan:
LogicalProject(EXPR$0=[CAST($0):BIGINT], EXPR$1=[$1])
 LogicalAggregate(group=[{}], EXPR$0=[SUM($1)], EXPR$1=[SUM($0)]) -->  
RowType[DECIMAL, BIGINT]
    LogicalAggregate(group=[{0}], 
EXPR$0=[SUM($0)]) --> RowType[INTEGER, 
BIGINT]

  LogicalProject(COMM=[$6])
    LogicalTableScan(table=[[CATALOG, SALES, EMP]])

But the rule ignores the change row type in the bottom aggregate.

I propose the simple fix: pass 'null' type to the 
'AggregateCall.create' call to infer aggregate type from the input.


[1]. https://issues.apache.org/jira/browse/CALCITE-4818
[2]. https://github.com/apache/calcite/pull/2560


--
Taras Ledkov
Mail-To: tled...@gridgain.com



Re: Review request: CALCITE-4652 (fix AggregateExpandDistinctAggregatesRule when SUM type is expanded)

2021-08-10 Thread Taras Ledkov

Hi Calcite Devs.

I just remind about review/merge the patch for the issue CALCITE-4652 
[1], see PR#2439 [2].
I've fixed the patch according with Julian comments. Also PR contains 
two 'LGTM' comments.

Is the patch ready for merge?

[1]. https://issues.apache.org/jira/browse/CALCITE-4652
[2]. https://github.com/apache/calcite/pull/2439

On 12.07.2021 15:05, xiong duan wrote:

Hi. Ledkov. I'll do some code reviews in the next two days.

Taras Ledkov  于2021年7月12日周一 下午7:58写道:


Hi,

Please review the patch for the issue CALCITE-4652 [1], see PR#2439 [2].

I tried to draw attention to the issue in the topic:
"[HELP] Return type of the SUM aggregate function and
AggregateExpandDistinctAggregatesRule​",
but did not receive any answer, so I do not give a link to the discussion.

Stamatis advised me to send a reminder to the devlist.

[1]. https://issues.apache.org/jira/browse/CALCITE-4652
[2]. https://github.com/apache/calcite/pull/2439

--
Taras Ledkov
Mail-To: tled...@gridgain.com



--
Taras Ledkov
Mail-To: tled...@gridgain.com



Re: Review request: CALCITE-4652 (fix AggregateExpandDistinctAggregatesRule when SUM type is expanded)

2021-07-12 Thread xiong duan
Hi. Ledkov. I'll do some code reviews in the next two days.

Taras Ledkov  于2021年7月12日周一 下午7:58写道:

> Hi,
>
> Please review the patch for the issue CALCITE-4652 [1], see PR#2439 [2].
>
> I tried to draw attention to the issue in the topic:
> "[HELP] Return type of the SUM aggregate function and
> AggregateExpandDistinctAggregatesRule​",
> but did not receive any answer, so I do not give a link to the discussion.
>
> Stamatis advised me to send a reminder to the devlist.
>
> [1]. https://issues.apache.org/jira/browse/CALCITE-4652
> [2]. https://github.com/apache/calcite/pull/2439
>
> --
> Taras Ledkov
> Mail-To: tled...@gridgain.com
>
>


Re: [Review request] (CALCITE-2554) Enrich enumerable join operators with order preserving information

2019-01-03 Thread Michael Mior
Stamatis,

I left a comment on the PR. Overall it looks great to me although I wonder
if we want to resolve CALCITE-2582 first to avoid disabling the subquery
tests. Thanks!


--
Michael Mior
mm...@apache.org

Le jeu. 3 janv. 2019 à 11:09, Stamatis Zampetakis  a
écrit :

> Hello,
>
> Can somebody have a look in the PR.
>
> Enriching join operators with RelCollation information can have a big
> impact in performance of some queries since it allows dropping redundant
> sort operators which are in general costly to evaluate.
>
> Thanks in advance,
> Stamatis
>
> JIRA: https://issues.apache.org/jira/browse/CALCITE-2554
> PR: https://github.com/apache/calcite/pull/866
>


Re: Review Request (CALCITE-2689). ElasticSearch Adapter. Grouping on date / number fields.

2018-12-14 Thread Andrei Sereda
> Is the query still going to be inefficient if the user adds the IS NOT
NULL predicate?
It will be efficient. For explicit null check aggregates can be pushed to
ES since sentinel will never be used.

> Otherwise, I don't find it very problematic to expect the users to right
their queries correclty.
I think users will quickly forget to add IS NOT NULL and then complain
about slow queries. What to do when there are several grouping columns
(GROUP BY a, b, c) ?

I'll start working on Composite aggregations

to remedy this problem (requires ES 6.1+)


On Fri, Dec 14, 2018 at 5:23 AM Stamatis Zampetakis 
wrote:

> I have no strong opinion since I am not using the adapter.
>
> However, I have a small question regarding option 2.
>
> Is the query still going to be inefficient if the user adds the IS NOT NULL
> predicate?
> If yes, then I probably option 1 would be the best option for the moment.
> Otherwise, I don't find it
> very problematic to expect the users to right their queries correclty. The
> same happens with other database
> (e.g., Oracle) where sometimes you need to introduce NULL checks in the
> queries in order to use the
> existing indexes and have efficient queries.
>
> Best,
> Stamatis
>
>
> Στις Πέμ, 13 Δεκ 2018 στις 4:40 μ.μ., ο/η Andrei Sereda 
> έγραψε:
>
> > Thanks, Stamatis.
> >
> > I agree that sentinels are not the best approach but without pushing
> > aggregations to ES they're not very usable to elastic users.
> >
> > So there are two bad choices:
> > 1) (Low?) Probability of collisions triggering invalid results
> (surprises)
> > but efficient query.
> > 2)  Inefficient query (fetching whole index in memory)
> >
> > I hope we can drop support for ES 2.x soon and enforce ES >= 6.1. If so,
> > one can use  Composite aggregations
> > <
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html
> > >
> > which don't have these shortcomings.
> > What we can do now is have separate aggregation query for ES2 and ES6.
> >
> > What do you think?
> >
> > Regards,
> > Andrei.
> >
> >
> >
> > On Thu, Dec 13, 2018 at 6:16 AM Stamatis Zampetakis 
> > wrote:
> >
> > > Hi Andrei,
> > >
> > > I haven't gone entirely over the new PR, but I think there are cases
> > where
> > > the result of the queries are going to be wrong (when values collide
> with
> > > sentinels).
> > > Another approach would be to introduce a rule in order to push the
> > > aggregation in ElasticSearch *only* if the field in questions is *not*
> > > nullable. This can make
> > > queries a bit more verbose but it guarantees that there will be no
> > suprises
> > > to the end-user.
> > >
> > > select max(amount), date from orders group by amount, date ->
> Aggregation
> > > not pushed on ES
> > > select max(amount), date from orders where amount is not null and date
> is
> > > not null group by amount, date -> Aggregation pushed on ES
> > >
> > > Sorry for the late reply!
> > >
> > > Best,
> > > Stamatis
> > >
> > >
> > > Στις Κυρ, 2 Δεκ 2018 στις 2:01 π.μ., ο/η Julian Hyde  >
> > > έγραψε:
> > >
> > > > Interesting.
> > > >
> > > > One of the benefits that a SQL layer such as Calcite can bring is
> that
> > it
> > > > hides the details necessary to make operations like this work.
> > > >
> > > > Julian
> > > >
> > > >
> > > > > On Dec 1, 2018, at 5:22 AM, Kevin Risden 
> wrote:
> > > > >
> > > > > I haven't had a chance to review but saw that Elastic has the same
> > > issue
> > > > > with aggregations.
> > > > >
> > > > > https://github.com/elastic/elasticsearch/issues/35745
> > > > >
> > > > > Kevin Risden
> > > > >
> > > > > On Wed, Nov 28, 2018, 20:46 Andrei Sereda  > > > >
> > > > >> Greetings ,
> > > > >> We have discovered an issue with ES aggregations when grouping on
> > > > >> non-textual fields (date, long). Currently the following sql fails
> > > > because
> > > > >> for missing value
> > > > >> <
> > > > >>
> > > >
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_missing_value_13
> > > > >>>
> > > > >> we inject __MISSING__ sentinel which is not date / number
> parseable
> > > (it
> > > > >> can’t be null either) :
> > > > >>
> > > > >> select max(amount), date from orders group by date -- special ES
> > type
> > > > >>
> > > > >> The solution is to make sentinel type specific :
> > > > >>
> > > > >>   1. Integer.MIN_VALUE for integers
> > > > >>   2. -12-31 for dates etc.
> > > > >>
> > > > >> For low cardinality types like boolean, byte, short I’m not sure
> > what
> > > > to do
> > > > >> since there is high probability of collision between missing field
> > and
> > > > >> actual value (eg. what value to choose for missing boolean?) :
> > > > >>
> > > > >> select max(amount), isActive from orders group by isActive --
> > boolean
> > > > type
> > > > >>

Re: Review Request (CALCITE-2689). ElasticSearch Adapter. Grouping on date / number fields.

2018-12-14 Thread Stamatis Zampetakis
I have no strong opinion since I am not using the adapter.

However, I have a small question regarding option 2.

Is the query still going to be inefficient if the user adds the IS NOT NULL
predicate?
If yes, then I probably option 1 would be the best option for the moment.
Otherwise, I don't find it
very problematic to expect the users to right their queries correclty. The
same happens with other database
(e.g., Oracle) where sometimes you need to introduce NULL checks in the
queries in order to use the
existing indexes and have efficient queries.

Best,
Stamatis


Στις Πέμ, 13 Δεκ 2018 στις 4:40 μ.μ., ο/η Andrei Sereda 
έγραψε:

> Thanks, Stamatis.
>
> I agree that sentinels are not the best approach but without pushing
> aggregations to ES they're not very usable to elastic users.
>
> So there are two bad choices:
> 1) (Low?) Probability of collisions triggering invalid results (surprises)
> but efficient query.
> 2)  Inefficient query (fetching whole index in memory)
>
> I hope we can drop support for ES 2.x soon and enforce ES >= 6.1. If so,
> one can use  Composite aggregations
> <
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html
> >
> which don't have these shortcomings.
> What we can do now is have separate aggregation query for ES2 and ES6.
>
> What do you think?
>
> Regards,
> Andrei.
>
>
>
> On Thu, Dec 13, 2018 at 6:16 AM Stamatis Zampetakis 
> wrote:
>
> > Hi Andrei,
> >
> > I haven't gone entirely over the new PR, but I think there are cases
> where
> > the result of the queries are going to be wrong (when values collide with
> > sentinels).
> > Another approach would be to introduce a rule in order to push the
> > aggregation in ElasticSearch *only* if the field in questions is *not*
> > nullable. This can make
> > queries a bit more verbose but it guarantees that there will be no
> suprises
> > to the end-user.
> >
> > select max(amount), date from orders group by amount, date -> Aggregation
> > not pushed on ES
> > select max(amount), date from orders where amount is not null and date is
> > not null group by amount, date -> Aggregation pushed on ES
> >
> > Sorry for the late reply!
> >
> > Best,
> > Stamatis
> >
> >
> > Στις Κυρ, 2 Δεκ 2018 στις 2:01 π.μ., ο/η Julian Hyde 
> > έγραψε:
> >
> > > Interesting.
> > >
> > > One of the benefits that a SQL layer such as Calcite can bring is that
> it
> > > hides the details necessary to make operations like this work.
> > >
> > > Julian
> > >
> > >
> > > > On Dec 1, 2018, at 5:22 AM, Kevin Risden  wrote:
> > > >
> > > > I haven't had a chance to review but saw that Elastic has the same
> > issue
> > > > with aggregations.
> > > >
> > > > https://github.com/elastic/elasticsearch/issues/35745
> > > >
> > > > Kevin Risden
> > > >
> > > > On Wed, Nov 28, 2018, 20:46 Andrei Sereda  > > >
> > > >> Greetings ,
> > > >> We have discovered an issue with ES aggregations when grouping on
> > > >> non-textual fields (date, long). Currently the following sql fails
> > > because
> > > >> for missing value
> > > >> <
> > > >>
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_missing_value_13
> > > >>>
> > > >> we inject __MISSING__ sentinel which is not date / number parseable
> > (it
> > > >> can’t be null either) :
> > > >>
> > > >> select max(amount), date from orders group by date -- special ES
> type
> > > >>
> > > >> The solution is to make sentinel type specific :
> > > >>
> > > >>   1. Integer.MIN_VALUE for integers
> > > >>   2. -12-31 for dates etc.
> > > >>
> > > >> For low cardinality types like boolean, byte, short I’m not sure
> what
> > > to do
> > > >> since there is high probability of collision between missing field
> and
> > > >> actual value (eg. what value to choose for missing boolean?) :
> > > >>
> > > >> select max(amount), isActive from orders group by isActive --
> boolean
> > > type
> > > >>
> > > >> Let me know if you solved this problem differently before. Composite
> > > >> aggregations
> > > >> <
> > > >>
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html
> > > >>>
> > > >> (available since 6.1) should help in future.
> > > >>
> > > >> PR: https://github.com/apache/calcite/pull/946
> > > >> JIRA: https://issues.apache.org/jira/browse/CALCITE-2689
> > > >>
> > > >> Many Thanks in Advance,
> > > >> Andrei.
> > > >>
> > >
> > >
> >
>


Re: Review Request (CALCITE-2689). ElasticSearch Adapter. Grouping on date / number fields.

2018-12-13 Thread Andrei Sereda
Thanks, Stamatis.

I agree that sentinels are not the best approach but without pushing
aggregations to ES they're not very usable to elastic users.

So there are two bad choices:
1) (Low?) Probability of collisions triggering invalid results (surprises)
but efficient query.
2)  Inefficient query (fetching whole index in memory)

I hope we can drop support for ES 2.x soon and enforce ES >= 6.1. If so,
one can use  Composite aggregations

which don't have these shortcomings.
What we can do now is have separate aggregation query for ES2 and ES6.

What do you think?

Regards,
Andrei.



On Thu, Dec 13, 2018 at 6:16 AM Stamatis Zampetakis 
wrote:

> Hi Andrei,
>
> I haven't gone entirely over the new PR, but I think there are cases where
> the result of the queries are going to be wrong (when values collide with
> sentinels).
> Another approach would be to introduce a rule in order to push the
> aggregation in ElasticSearch *only* if the field in questions is *not*
> nullable. This can make
> queries a bit more verbose but it guarantees that there will be no suprises
> to the end-user.
>
> select max(amount), date from orders group by amount, date -> Aggregation
> not pushed on ES
> select max(amount), date from orders where amount is not null and date is
> not null group by amount, date -> Aggregation pushed on ES
>
> Sorry for the late reply!
>
> Best,
> Stamatis
>
>
> Στις Κυρ, 2 Δεκ 2018 στις 2:01 π.μ., ο/η Julian Hyde 
> έγραψε:
>
> > Interesting.
> >
> > One of the benefits that a SQL layer such as Calcite can bring is that it
> > hides the details necessary to make operations like this work.
> >
> > Julian
> >
> >
> > > On Dec 1, 2018, at 5:22 AM, Kevin Risden  wrote:
> > >
> > > I haven't had a chance to review but saw that Elastic has the same
> issue
> > > with aggregations.
> > >
> > > https://github.com/elastic/elasticsearch/issues/35745
> > >
> > > Kevin Risden
> > >
> > > On Wed, Nov 28, 2018, 20:46 Andrei Sereda  > >
> > >> Greetings ,
> > >> We have discovered an issue with ES aggregations when grouping on
> > >> non-textual fields (date, long). Currently the following sql fails
> > because
> > >> for missing value
> > >> <
> > >>
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_missing_value_13
> > >>>
> > >> we inject __MISSING__ sentinel which is not date / number parseable
> (it
> > >> can’t be null either) :
> > >>
> > >> select max(amount), date from orders group by date -- special ES type
> > >>
> > >> The solution is to make sentinel type specific :
> > >>
> > >>   1. Integer.MIN_VALUE for integers
> > >>   2. -12-31 for dates etc.
> > >>
> > >> For low cardinality types like boolean, byte, short I’m not sure what
> > to do
> > >> since there is high probability of collision between missing field and
> > >> actual value (eg. what value to choose for missing boolean?) :
> > >>
> > >> select max(amount), isActive from orders group by isActive -- boolean
> > type
> > >>
> > >> Let me know if you solved this problem differently before. Composite
> > >> aggregations
> > >> <
> > >>
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html
> > >>>
> > >> (available since 6.1) should help in future.
> > >>
> > >> PR: https://github.com/apache/calcite/pull/946
> > >> JIRA: https://issues.apache.org/jira/browse/CALCITE-2689
> > >>
> > >> Many Thanks in Advance,
> > >> Andrei.
> > >>
> >
> >
>


Re: Review Request (CALCITE-2689). ElasticSearch Adapter. Grouping on date / number fields.

2018-12-13 Thread Stamatis Zampetakis
Hi Andrei,

I haven't gone entirely over the new PR, but I think there are cases where
the result of the queries are going to be wrong (when values collide with
sentinels).
Another approach would be to introduce a rule in order to push the
aggregation in ElasticSearch *only* if the field in questions is *not*
nullable. This can make
queries a bit more verbose but it guarantees that there will be no suprises
to the end-user.

select max(amount), date from orders group by amount, date -> Aggregation
not pushed on ES
select max(amount), date from orders where amount is not null and date is
not null group by amount, date -> Aggregation pushed on ES

Sorry for the late reply!

Best,
Stamatis


Στις Κυρ, 2 Δεκ 2018 στις 2:01 π.μ., ο/η Julian Hyde 
έγραψε:

> Interesting.
>
> One of the benefits that a SQL layer such as Calcite can bring is that it
> hides the details necessary to make operations like this work.
>
> Julian
>
>
> > On Dec 1, 2018, at 5:22 AM, Kevin Risden  wrote:
> >
> > I haven't had a chance to review but saw that Elastic has the same issue
> > with aggregations.
> >
> > https://github.com/elastic/elasticsearch/issues/35745
> >
> > Kevin Risden
> >
> > On Wed, Nov 28, 2018, 20:46 Andrei Sereda  >
> >> Greetings ,
> >> We have discovered an issue with ES aggregations when grouping on
> >> non-textual fields (date, long). Currently the following sql fails
> because
> >> for missing value
> >> <
> >>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_missing_value_13
> >>>
> >> we inject __MISSING__ sentinel which is not date / number parseable (it
> >> can’t be null either) :
> >>
> >> select max(amount), date from orders group by date -- special ES type
> >>
> >> The solution is to make sentinel type specific :
> >>
> >>   1. Integer.MIN_VALUE for integers
> >>   2. -12-31 for dates etc.
> >>
> >> For low cardinality types like boolean, byte, short I’m not sure what
> to do
> >> since there is high probability of collision between missing field and
> >> actual value (eg. what value to choose for missing boolean?) :
> >>
> >> select max(amount), isActive from orders group by isActive -- boolean
> type
> >>
> >> Let me know if you solved this problem differently before. Composite
> >> aggregations
> >> <
> >>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html
> >>>
> >> (available since 6.1) should help in future.
> >>
> >> PR: https://github.com/apache/calcite/pull/946
> >> JIRA: https://issues.apache.org/jira/browse/CALCITE-2689
> >>
> >> Many Thanks in Advance,
> >> Andrei.
> >>
>
>


Re: Review Request (CALCITE-2689). ElasticSearch Adapter. Grouping on date / number fields.

2018-12-01 Thread Julian Hyde
Interesting.

One of the benefits that a SQL layer such as Calcite can bring is that it hides 
the details necessary to make operations like this work.

Julian


> On Dec 1, 2018, at 5:22 AM, Kevin Risden  wrote:
> 
> I haven't had a chance to review but saw that Elastic has the same issue
> with aggregations.
> 
> https://github.com/elastic/elasticsearch/issues/35745
> 
> Kevin Risden
> 
> On Wed, Nov 28, 2018, 20:46 Andrei Sereda  
>> Greetings ,
>> We have discovered an issue with ES aggregations when grouping on
>> non-textual fields (date, long). Currently the following sql fails because
>> for missing value
>> <
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_missing_value_13
>>> 
>> we inject __MISSING__ sentinel which is not date / number parseable (it
>> can’t be null either) :
>> 
>> select max(amount), date from orders group by date -- special ES type
>> 
>> The solution is to make sentinel type specific :
>> 
>>   1. Integer.MIN_VALUE for integers
>>   2. -12-31 for dates etc.
>> 
>> For low cardinality types like boolean, byte, short I’m not sure what to do
>> since there is high probability of collision between missing field and
>> actual value (eg. what value to choose for missing boolean?) :
>> 
>> select max(amount), isActive from orders group by isActive -- boolean type
>> 
>> Let me know if you solved this problem differently before. Composite
>> aggregations
>> <
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html
>>> 
>> (available since 6.1) should help in future.
>> 
>> PR: https://github.com/apache/calcite/pull/946
>> JIRA: https://issues.apache.org/jira/browse/CALCITE-2689
>> 
>> Many Thanks in Advance,
>> Andrei.
>> 



Re: Review Request (CALCITE-2689). ElasticSearch Adapter. Grouping on date / number fields.

2018-12-01 Thread Kevin Risden
I haven't had a chance to review but saw that Elastic has the same issue
with aggregations.

https://github.com/elastic/elasticsearch/issues/35745

Kevin Risden

On Wed, Nov 28, 2018, 20:46 Andrei Sereda  Greetings ,
> We have discovered an issue with ES aggregations when grouping on
> non-textual fields (date, long). Currently the following sql fails because
> for missing value
> <
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_missing_value_13
> >
> we inject __MISSING__ sentinel which is not date / number parseable (it
> can’t be null either) :
>
> select max(amount), date from orders group by date -- special ES type
>
> The solution is to make sentinel type specific :
>
>1. Integer.MIN_VALUE for integers
>2. -12-31 for dates etc.
>
> For low cardinality types like boolean, byte, short I’m not sure what to do
> since there is high probability of collision between missing field and
> actual value (eg. what value to choose for missing boolean?) :
>
> select max(amount), isActive from orders group by isActive -- boolean type
>
> Let me know if you solved this problem differently before. Composite
> aggregations
> <
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html
> >
> (available since 6.1) should help in future.
>
> PR: https://github.com/apache/calcite/pull/946
> JIRA: https://issues.apache.org/jira/browse/CALCITE-2689
>
> Many Thanks in Advance,
> Andrei.
>


Re: Review request [CALCITE-2404]

2018-07-30 Thread Julian Hyde
I have reviewed. See my comments in the jira case.

> On Jul 26, 2018, at 11:22 PM, Stamatis Zampetakis  wrote:
> 
> Hello,
> 
> Can somebody have a look at
> [CALCITE-2404] Accessing structured-types is not implemented by the runtime
> https://issues.apache.org/jira/browse/CALCITE-2404
> 
> The pull request is here:
> https://github.com/apache/calcite/pull/762
> 
> Thanks,
> Stamatis