[jira] [Created] (CALCITE-4644) Add percentile_cont and percentile_disc aggregate functions

2021-06-08 Thread Rafay A (Jira)
Rafay A created CALCITE-4644:


 Summary: Add percentile_cont and percentile_disc aggregate 
functions
 Key: CALCITE-4644
 URL: https://issues.apache.org/jira/browse/CALCITE-4644
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Rafay A


Add percentile functions in the operator table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [HELP] Calcite interpreting SQL to Druid DSL Question

2021-06-08 Thread Xu Chi
Hi Ben,

Thanks for your prompt response.
I checked the data schema in Druid and found below definition which specify the 
timestamp column.
If I query with “__time”, it will be the situation which mentioned in previous 
email. And I tried to query with the column name “at”, it shows there’s no such 
column which I think it has been filled into the “__time” column.
Did I missed any steps when setting up the model or define Calcite’s 
DruidSchema which resulting this issue? 
Please advise. Thanks.

{
...
"timestampSpec": {
"column": "at",
"format": "posix"
}
...
}

Regards,
Chi
> 
> On Jun 9, 2021, at 12:01 PM, Ben Krug  wrote:
> 
> I'm coming from the druid side, not a calcite person, but I wanted to ask.
> What tutorial do you mean?  Did you specify a model, and did it specify
> timestampColumn?  (If so, was it __time?)
> 
> I notice that in the druid console, if you use your SQL, it translate
> correctly, as in your desired DSL, so I imagine it's something in the
> configuration, maybe the model configuration?
> 
>> On Tue, Jun 8, 2021 at 8:37 PM Xu Chi  wrote:
>> 
>> Hi team,
>> 
>> I’m trying to integrate Druid with Calcite by using calcite-druid adapter.
>> After following the tutorial, we could connect Calcite with Druid and
>> fetching metadata back.
>> But when trying to execute query with below SQL, the DSL generated by
>> Calcite is translating the __time where clause by using the combination of
>> full time period intervals + filters to specify the specified time range.
>> But this kind of interpreting results in a bad query performance.
>> 
>> Is there any way that we could achieve that translating the __time where
>> clause directly into interval field as shown below in the “Desired DSL for
>> interval field” section?
>> 
>> Thank you.
>> 
>> SQL:
>> SELECT * FROM xxx WHERE __time > '2021-06-06T00:00:15+08:00' AND __time <
>> '2021-06-07T00:00:15+08:00' LIMIT 10
>> 
>> DSL Generated by Calcite:
>> {
>>"queryType": "scan",
>>"dataSource": "xxx",
>>"intervals": [
>>"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"
>>],
>>"filter": {
>>"type": "and",
>>"fields": [
>>{
>>"type": "expression",
>>"expression": "(\"__time\" >
>> timestamp_parse('2021-06-06T00:00:15\\u002B08:00','','Asia/Shanghai'))"
>>},
>>{
>>"type": "expression",
>>"expression": "(\"__time\" <
>> timestamp_parse('2021-06-07T00:00:15\\u002B08:00','','Asia/Shanghai'))"
>>}
>>]
>>},
>>"columns": [
>>"__time",
>>...
>>],
>>"resultFormat": "compactedList",
>>"limit": 10
>> }
>> 
>> Desired DSL for the interval field:
>> "intervals": [ "2021-06-07T00:00:15+08:00/2021-06-08T00:00:15+08:00" ]
>> 
>> 
>> Regards,
>> Chi



Re: [HELP] Calcite interpreting SQL to Druid DSL Question

2021-06-08 Thread Ben Krug
I'm coming from the druid side, not a calcite person, but I wanted to ask.
What tutorial do you mean?  Did you specify a model, and did it specify
timestampColumn?  (If so, was it __time?)

I notice that in the druid console, if you use your SQL, it translate
correctly, as in your desired DSL, so I imagine it's something in the
configuration, maybe the model configuration?

On Tue, Jun 8, 2021 at 8:37 PM Xu Chi  wrote:

> Hi team,
>
> I’m trying to integrate Druid with Calcite by using calcite-druid adapter.
> After following the tutorial, we could connect Calcite with Druid and
> fetching metadata back.
> But when trying to execute query with below SQL, the DSL generated by
> Calcite is translating the __time where clause by using the combination of
> full time period intervals + filters to specify the specified time range.
> But this kind of interpreting results in a bad query performance.
>
> Is there any way that we could achieve that translating the __time where
> clause directly into interval field as shown below in the “Desired DSL for
> interval field” section?
>
> Thank you.
>
> SQL:
> SELECT * FROM xxx WHERE __time > '2021-06-06T00:00:15+08:00' AND __time <
> '2021-06-07T00:00:15+08:00' LIMIT 10
>
> DSL Generated by Calcite:
> {
> "queryType": "scan",
> "dataSource": "xxx",
> "intervals": [
> "1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"
> ],
> "filter": {
> "type": "and",
> "fields": [
> {
> "type": "expression",
> "expression": "(\"__time\" >
> timestamp_parse('2021-06-06T00:00:15\\u002B08:00','','Asia/Shanghai'))"
> },
> {
> "type": "expression",
> "expression": "(\"__time\" <
> timestamp_parse('2021-06-07T00:00:15\\u002B08:00','','Asia/Shanghai'))"
> }
> ]
> },
> "columns": [
> "__time",
> ...
> ],
> "resultFormat": "compactedList",
> "limit": 10
> }
>
> Desired DSL for the interval field:
> "intervals": [ "2021-06-07T00:00:15+08:00/2021-06-08T00:00:15+08:00" ]
>
>
> Regards,
> Chi


[HELP] Calcite interpreting SQL to Druid DSL Question

2021-06-08 Thread Xu Chi
Hi team,

I’m trying to integrate Druid with Calcite by using calcite-druid adapter.
After following the tutorial, we could connect Calcite with Druid and fetching 
metadata back.
But when trying to execute query with below SQL, the DSL generated by Calcite 
is translating the __time where clause by using the combination of full time 
period intervals + filters to specify the specified time range.  But this kind 
of interpreting results in a bad query performance.

Is there any way that we could achieve that translating the __time where clause 
directly into interval field as shown below in the “Desired DSL for interval 
field” section?

Thank you.

SQL:
SELECT * FROM xxx WHERE __time > '2021-06-06T00:00:15+08:00' AND __time < 
'2021-06-07T00:00:15+08:00' LIMIT 10

DSL Generated by Calcite:
{
"queryType": "scan",
"dataSource": "xxx",
"intervals": [
"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"
],
"filter": {
"type": "and",
"fields": [
{
"type": "expression",
"expression": "(\"__time\" > 
timestamp_parse('2021-06-06T00:00:15\\u002B08:00','','Asia/Shanghai'))"
},
{
"type": "expression",
"expression": "(\"__time\" < 
timestamp_parse('2021-06-07T00:00:15\\u002B08:00','','Asia/Shanghai'))"
}
]
},
"columns": [
"__time",
...
],
"resultFormat": "compactedList",
"limit": 10
}

Desired DSL for the interval field:
"intervals": [ "2021-06-07T00:00:15+08:00/2021-06-08T00:00:15+08:00" ]


Regards,
Chi

Re: Problem with Spool (de-)serialization

2021-06-08 Thread Julian Hyde
Including class names in serialized JSON makes me nervous. It is a common 
attack surface. Could we, say, include the name of the enum (without package 
name) and map it to a list of supported Enums?

> On Jun 8, 2021, at 9:20 AM, Konstantin Orlov  wrote:
> 
> Hi, folks!
> 
> We have a problem with Spool node deserialisation. Currently it explains both 
> readType and writeType fields as Enum,
> thus those fields being serialized as map:
> 
>  {
> "id":"2",
> "relOp”:"MyTableSpool",
> "readType":{
>"class":"org.apache.calcite.rel.core.Spool$Type",
>"name":"LAZY"
> },
> "writeType":{
>"class":"org.apache.calcite.rel.core.Spool$Type",
>"name":"EAGER"
> }
>  }
> 
> When deserializing, we use RelInput#getEnum which expects the provided tag 
> being a string value representing the enum's value name.
> 
> Should we follow the way used for serialization of JoinRelType within the 
> Join node (serialize the enum value as its name in lower case)? Here is 
> example:
> 
>  {
> "id":"4",
> "relOp”:"MyJoin",
> "condition":{
>"op":{
>   "name":">",
>   "kind":"SqlKind#GREATER_THAN",
>   "syntax":"SqlSyntax#BINARY"
>},
>"operands":[
>   {
>  "input":1,
>  "name":"$1"
>   },
>   {
>  "input":4,
>  "name":"$4"
>   }
>]
> },
> "joinType":"inner",
> "variablesSet":[0],
> "correlationVariables":[0],
> "inputs":["0","3"]
>  }
> 
> Or it's better to get the RelInput being able to deserialize enum represented 
> as map as well?
> 
> Personally I prefer the first option cause the type of the enum is known for 
> sure, so it's better not to waste the time to (de-)serialize it.
> 
> 
> -- 
> Regards,
> Konstantin Orlov
> 
> 
> 
> 



Types.NULL not handled in AbstractCursor

2021-06-08 Thread Константин Новиков

Hi,
 
Is there any explanation for missing handling of Types.NULL in 
AbstractCursor.createAccessor()? I have a query which includes NULL in the 
projection (eg, select 1 as x, NULL, ...) and this is causing an exception. 
Thanks.
 

Re: JIRA account as a contributor

2021-06-08 Thread Dmitry Sysolyatin
Thanks !

On Tue, Jun 8, 2021 at 6:28 PM Haisheng Yuan  wrote:

> Hi Dmitry,
>
> I have added you as contributor.
>
> Thanks,
> Haisheng Yuan
>
> On 2021/06/08 13:55:58, Dmitry Sysolyatin 
> wrote:
> > Hi !
> > I would like to join to calcite team. I already fixed a small bug
> > https://issues.apache.org/jira/browse/CALCITE-4630. But I can not do
> > anything with JIRA ticket. My JIRA username is dmsysolyatin
> >
>


Problem with Spool (de-)serialization

2021-06-08 Thread Konstantin Orlov
Hi, folks!

We have a problem with Spool node deserialisation. Currently it explains both 
readType and writeType fields as Enum,
thus those fields being serialized as map:

  {
 "id":"2",
 "relOp”:"MyTableSpool",
 "readType":{
"class":"org.apache.calcite.rel.core.Spool$Type",
"name":"LAZY"
 },
 "writeType":{
"class":"org.apache.calcite.rel.core.Spool$Type",
"name":"EAGER"
 }
  }

When deserializing, we use RelInput#getEnum which expects the provided tag 
being a string value representing the enum's value name.

Should we follow the way used for serialization of JoinRelType within the Join 
node (serialize the enum value as its name in lower case)? Here is example:

  {
 "id":"4",
 "relOp”:"MyJoin",
 "condition":{
"op":{
   "name":">",
   "kind":"SqlKind#GREATER_THAN",
   "syntax":"SqlSyntax#BINARY"
},
"operands":[
   {
  "input":1,
  "name":"$1"
   },
   {
  "input":4,
  "name":"$4"
   }
]
 },
 "joinType":"inner",
 "variablesSet":[0],
 "correlationVariables":[0],
 "inputs":["0","3"]
  }

Or it's better to get the RelInput being able to deserialize enum represented 
as map as well?

Personally I prefer the first option cause the type of the enum is known for 
sure, so it's better not to waste the time to (de-)serialize it.


-- 
Regards,
Konstantin Orlov






Re: JIRA account as a contributor

2021-06-08 Thread Haisheng Yuan
Hi Dmitry,

I have added you as contributor.

Thanks,
Haisheng Yuan

On 2021/06/08 13:55:58, Dmitry Sysolyatin  wrote: 
> Hi !
> I would like to join to calcite team. I already fixed a small bug
> https://issues.apache.org/jira/browse/CALCITE-4630. But I can not do
> anything with JIRA ticket. My JIRA username is dmsysolyatin
> 


Re: Serializing Relational Algebra to JSON

2021-06-08 Thread Alex Baden
Hi Lana,

As Stamatis said, we use a custom RelWriter and the explain method on
the rel nodes to dump the plan to JSON. It is the same few lines as
here, but we've plucked them out of RelOptUtil into our own serializer
class: 
https://github.com/apache/calcite/blob/7f5e9b8b7e6b4afd8e4f21524aa3c4ce8b7ddb61/core/src/main/java/org/apache/calcite/plan/RelOptUtil.java#L2132

All our code using Calcite is open source, and the serialization code
is available in our open source repo if you want to have a loook:
https://github.com/omnisci/omniscidb/blob/master/java/calcite/src/main/java/org/apache/calcite/rel/externalize/MapDRelJson.java

If there is a more canonical way of doing this, I would be interested
in hearing about it.

We also recently wanted to dump SQL nodes (DML commands) to JSON, but
didn't see a similar interface at the SQL level so we cobbled together
our own override of toString.


Alex

On Sun, Jun 6, 2021 at 3:52 PM Stamatis Zampetakis  wrote:
>
> Hi Lana,
>
> I think what you are looking for is RelOptUtil#dumpPlan [1]
> using SqlExplainFormat.JSON as one of the parameters.
> You can check the callers of this method for examples on how to use it.
>
> Best,
> Stamatis
>
> [1]
> https://github.com/apache/calcite/blob/7f5e9b8b7e6b4afd8e4f21524aa3c4ce8b7ddb61/core/src/main/java/org/apache/calcite/plan/RelOptUtil.java#L2116
>
> On Sun, Jun 6, 2021 at 10:42 PM Lana Ramjit  wrote:
>
> > Hi all,
> >
> > I was reading this old blog post from OmniSci
> > <
> > https://www.omnisci.com/blog/fast-and-flexible-query-analysis-at-mapd-with-apache-calcite-2
> > >and
> > they mention that to use Calcite-generated plans in their C++ library, they
> > serialized the relational algebra output by Calcite into JSON. I'd like to
> > do something like this for a project but I can't seem to find any good
> > documentation of this. I found this page in the docs
> >  about the model, but what I
> > am
> > looking for hopefully is a function that I can pass the root of a Calcite
> > plan and get a JSONified serialization. Does this exist?
> >
> > Thanks,
> > Lana
> >


JIRA account as a contributor

2021-06-08 Thread Dmitry Sysolyatin
Hi !
I would like to join to calcite team. I already fixed a small bug
https://issues.apache.org/jira/browse/CALCITE-4630. But I can not do
anything with JIRA ticket. My JIRA username is dmsysolyatin


[jira] [Created] (CALCITE-4643) AssertionError for outerjoin query

2021-06-08 Thread TaiNing Wang (Jira)
TaiNing Wang created CALCITE-4643:
-

 Summary: AssertionError for outerjoin query
 Key: CALCITE-4643
 URL: https://issues.apache.org/jira/browse/CALCITE-4643
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.26.0
 Environment: OS: Ubuntu 16.04
Reporter: TaiNing Wang


Hello,

 

I ran into some error when trying to run the following query in calcite-1.26.0:

 

 

 
{code:java}
SELECT * FROM aka_name INNER JOIN name ON aka_name.person_id = name.id
INNER JOIN cast_info ON aka_name.person_id = cast_info.person_id AND name.id = 
cast_info.person_id
INNER JOIN title ON cast_info.movie_id = title.id
LEFT JOIN (SELECT * FROM role_type WHERE role = 'writer') AS t1 ON 
cast_info.role_id = t1.id
LEFT JOIN movie_companies ON cast_info.movie_id = movie_companies.movie_id AND 
title.id = movie_companies.movie_id
WHERE NOT EXISTS (SELECT 1 FROM company_name WHERE country_code = '[us]' AND 
movie_companies.company_id = company_name.id)
{code}
 

Error:

 
{code:java}
    java.lang.AssertionError
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.getCorrelationUse(SqlToRelConverter.java:2754)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.createJoin(SqlToRelConverter.java:2636)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.register(SqlToRelConverter.java:4526)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.register(SqlToRelConverter.java:4451)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.substituteSubQuery(SqlToRelConverter.java:1253)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.replaceSubQueries(SqlToRelConverter.java:1063)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertWhere(SqlToRelConverter.java:1029)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:666)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:644)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3438)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:570)
        at org.apache.calcite.prepare.PlannerImpl.rel(PlannerImpl.java:247)
        ^[[0;1mat MyTest.query0(MyTest.java:44)
{code}
 

 

The parser config used:

 
{code:java}
  private static final SqlParser.Config PARSER_CONFIG
      = SqlParser.configBuilder().setLex(DEFAULT_LEX).build();
{code}
 

 

A simpler query that gives the same error:

 
{code:java}
SELECT * FROM cast_info LEFT JOIN movie_companies ON cast_info.movie_id = 
movie_companies.movie_id
WHERE NOT EXISTS (SELECT 1 FROM company_name WHERE country_code = '[us]' AND 
movie_companies.company_id = company_name.id)
{code}
 

 

Any idea what the problem is? Thanks a lot!

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4642) PlannerImpl ignores TypeSystems provided by FrameworkConfig

2021-06-08 Thread Nick Riasanovsky (Jira)
Nick Riasanovsky created CALCITE-4642:
-

 Summary: PlannerImpl ignores TypeSystems provided by 
FrameworkConfig
 Key: CALCITE-4642
 URL: https://issues.apache.org/jira/browse/CALCITE-4642
 Project: Calcite
  Issue Type: Bug
Affects Versions: 1.27.0
Reporter: Nick Riasanovsky


Hi I'm relatively new to calcite, so my apologies if anything here isn't 
helpful.

FrameworkConfig provides the options to set your own typeSystem via 
`Frameworks.newConfigBuilder().typeSystem(myTypeSystem)`. However, when later 
using this in a planner via `Frameworks.getPlanner(config)`, this new 
TypeSystem is always ignored in the implementation.

Looking into the source code, it appears to me that there are two main issues. 
First, the PlannerImpl doesn't extract the typeSystem from the config 
[https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/prepare/PlannerImpl.java#L108]
 and then when generating the typeFactory it always uses the default 
[https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/prepare/PlannerImpl.java#L179.]

 

I think this should be a relatively simple change and I'm happy to provide a 
PR, but since I'm new to calcite I will need some input on how to properly 
provide tests for this. I tested this issue with an example with a class that 
copies `RelDataTypeSystem.DEFAULT` except, for setting 
`shouldConvertRaggedUnionTypesToVarying`, but there is probably an easier way 
to test this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)