date:20230307

[jira] [Assigned] (CALCITE-4771) change the value of the CAST function to be nullable

2023-03-07 Thread Zou (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zou reassigned CALCITE-4771:


Assignee: Zou

> change the value of the CAST function to be nullable 
> -
>
> Key: CALCITE-4771
> URL: https://issues.apache.org/jira/browse/CALCITE-4771
> Project: Calcite
>  Issue Type: Improvement
>Reporter: xuyang
>Assignee: Zou
>Priority: Major
> Attachments: image-2021-09-16-11-43-55-743.png
>
>
> In the sql "SELECT CAST('haha' AS INT)",the value the function CAST returns 
> will be parsed  into NOT NULL, because when parsing, the type CAST returns is 
> from the INT and the nullable attribute is from the 'haha', which doesn't 
> consider the condition that parsing a string to an int could be invalid and 
> return NULL values.
> I think there are two ways to improve this question:
>  * One is to change the value of the CAST function to be nullable, which 
> avoids the invalid parsing.
>  * The other way is to introduce a function named TRY_CAST, which is used in 
> SQL Server.If the parsing fails, TRY_CAST will return NULL instead of throws 
> exception that a NOT NULL field will be set with our unexpected value NULL.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-4771) change the value of the CAST function to be nullable

2023-03-07 Thread Zou (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697751#comment-17697751
 ] 

Zou commented on CALCITE-4771:
--

I'd like to take this ticket

> change the value of the CAST function to be nullable 
> -
>
> Key: CALCITE-4771
> URL: https://issues.apache.org/jira/browse/CALCITE-4771
> Project: Calcite
>  Issue Type: Improvement
>Reporter: xuyang
>Priority: Major
> Attachments: image-2021-09-16-11-43-55-743.png
>
>
> In the sql "SELECT CAST('haha' AS INT)",the value the function CAST returns 
> will be parsed  into NOT NULL, because when parsing, the type CAST returns is 
> from the INT and the nullable attribute is from the 'haha', which doesn't 
> consider the condition that parsing a string to an int could be invalid and 
> return NULL values.
> I think there are two ways to improve this question:
>  * One is to change the value of the CAST function to be nullable, which 
> avoids the invalid parsing.
>  * The other way is to introduce a function named TRY_CAST, which is used in 
> SQL Server.If the parsing fails, TRY_CAST will return NULL instead of throws 
> exception that a NOT NULL field will be set with our unexpected value NULL.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5563) Reduce loops to improve RelSubset

2023-03-07 Thread asdfgh19 (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697694#comment-17697694
 ] 

asdfgh19 commented on CALCITE-5563:
---

[~julianhyde] Reduce loops to improve RelSubset, how about this

> Reduce loops to improve RelSubset
> -
>
> Key: CALCITE-5563
> URL: https://issues.apache.org/jira/browse/CALCITE-5563
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: asdfgh19
>Assignee: asdfgh19
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.34.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>  
> {code:java}
> /**
>  * Returns the collection of RelNodes one of whose inputs is in this
>  * subset.
>  */
> Set getParents() {
>   final Set list = new LinkedHashSet<>();
>   for (RelNode parent : set.getParentRels()) {
> for (RelSubset rel : inputSubsets(parent)) {
>   // see usage of this method in propagateCostImprovements0()
>   if (rel == this) {
> list.add(parent);
> break;
>   }
> }
>   }
>   return list;
> }
> /**
>  * Returns the collection of distinct subsets that contain a RelNode one
>  * of whose inputs is in this subset.
>  */
> Set getParentSubsets(VolcanoPlanner planner) {
>   final Set list = new LinkedHashSet<>();
>   for (RelNode parent : set.getParentRels()) {
> for (RelSubset rel : inputSubsets(parent)) {
>   if (rel.set == set && rel.getTraitSet().equals(traitSet)) {
> list.add(planner.getSubsetNonNull(parent));
> break;
>   }
> }
>   }
>   return list;
> }{code}
>  
> Once we have found a matching Relsubset from its parent input, the subsequent 
> inner loop is unnecessary，so we can immediately end the inner loop to save a 
> little time. This is just a minor improment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-5563) Reduce loops to improve RelSubset

2023-03-07 Thread asdfgh19 (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

asdfgh19 updated CALCITE-5563:
--
Summary: Reduce loops to improve RelSubset  (was: Add a break to the inner 
loop of RelSubset#getParents and RelSubset#getParentSubsets after we find a 
matching Relsubset from its parent input)

> Reduce loops to improve RelSubset
> -
>
> Key: CALCITE-5563
> URL: https://issues.apache.org/jira/browse/CALCITE-5563
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: asdfgh19
>Assignee: asdfgh19
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.34.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> /**
>  * Returns the collection of RelNodes one of whose inputs is in this
>  * subset.
>  */
> Set getParents() {
>   final Set list = new LinkedHashSet<>();
>   for (RelNode parent : set.getParentRels()) {
> for (RelSubset rel : inputSubsets(parent)) {
>   // see usage of this method in propagateCostImprovements0()
>   if (rel == this) {
> list.add(parent);
> break;
>   }
> }
>   }
>   return list;
> }
> /**
>  * Returns the collection of distinct subsets that contain a RelNode one
>  * of whose inputs is in this subset.
>  */
> Set getParentSubsets(VolcanoPlanner planner) {
>   final Set list = new LinkedHashSet<>();
>   for (RelNode parent : set.getParentRels()) {
> for (RelSubset rel : inputSubsets(parent)) {
>   if (rel.set == set && rel.getTraitSet().equals(traitSet)) {
> list.add(planner.getSubsetNonNull(parent));
> break;
>   }
> }
>   }
>   return list;
> }{code}
>  
> Once we have found a matching Relsubset from its parent input, the subsequent 
> inner loop is unnecessary，so we can immediately end the inner loop to save a 
> little time. This is just a minor improment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-5565) Implement BigQuery LOG function

2023-03-07 Thread Tanner Clary (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanner Clary updated CALCITE-5565:
--
Description: 
Calcite currently supports the {{LN}} and {{LOG10}} function. BigQuery offers 
the {{LOG}} function which allows for the base of the logarithm to be specified 
as a second argument. If the second argument is not provided, the base is 
defaulted to e (making it identical to the {{LN}} function). 

Example: {{LOG(64, 10)}} would return {{2}}.
Example 2: {{LOG(10)}} would return the same as {{LN(10)}}.

[BigQuery 
docs|https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#log]

  was:
Calcite currently supports the {{LN}} and {{LOG10}} function. BigQuery offers 
the {{LOG}} function which allows for the base of the logarithm to be specified 
as a second argument. If the second argument is not provided, the base is 
defaulted to e (making it identical to the {{LN}} function. 

Example: {{LOG(64, 10)}} would return {{2}}.
Example 2: {{LOG(10)}} would return the same as {{LN(10)}}.

[BigQuery 
docs|https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#log]


> Implement BigQuery LOG function
> ---
>
> Key: CALCITE-5565
> URL: https://issues.apache.org/jira/browse/CALCITE-5565
> Project: Calcite
>  Issue Type: Task
>Reporter: Tanner Clary
>Assignee: Tanner Clary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Calcite currently supports the {{LN}} and {{LOG10}} function. BigQuery offers 
> the {{LOG}} function which allows for the base of the logarithm to be 
> specified as a second argument. If the second argument is not provided, the 
> base is defaulted to e (making it identical to the {{LN}} function). 
> Example: {{LOG(64, 10)}} would return {{2}}.
> Example 2: {{LOG(10)}} would return the same as {{LN(10)}}.
> [BigQuery 
> docs|https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#log]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-5565) Implement BigQuery LOG function

2023-03-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CALCITE-5565:

Labels: pull-request-available  (was: )

> Implement BigQuery LOG function
> ---
>
> Key: CALCITE-5565
> URL: https://issues.apache.org/jira/browse/CALCITE-5565
> Project: Calcite
>  Issue Type: Task
>Reporter: Tanner Clary
>Assignee: Tanner Clary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Calcite currently supports the {{LN}} and {{LOG10}} function. BigQuery offers 
> the {{LOG}} function which allows for the base of the logarithm to be 
> specified as a second argument. If the second argument is not provided, the 
> base is defaulted to e (making it identical to the {{LN}} function. 
> Example: {{LOG(64, 10)}} would return {{2}}.
> Example 2: {{LOG(10)}} would return the same as {{LN(10)}}.
> [BigQuery 
> docs|https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#log]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (CALCITE-5565) Implement BigQuery LOG function

2023-03-07 Thread Tanner Clary (Jira)

Tanner Clary created CALCITE-5565:
-

 Summary: Implement BigQuery LOG function
 Key: CALCITE-5565
 URL: https://issues.apache.org/jira/browse/CALCITE-5565
 Project: Calcite
  Issue Type: Task
Reporter: Tanner Clary
Assignee: Tanner Clary


Calcite currently supports the {{LN}} and {{LOG10}} function. BigQuery offers 
the {{LOG}} function which allows for the base of the logarithm to be specified 
as a second argument. If the second argument is not provided, the base is 
defaulted to e (making it identical to the {{LN}} function. 

Example: {{LOG(64, 10)}} would return {{2}}.
Example 2: {{LOG(10)}} would return the same as {{LN(10)}}.

[BigQuery 
docs|https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#log]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5563) Add a break to the inner loop of RelSubset#getParents and RelSubset#getParentSubsets after we find a matching Relsubset from its parent input

2023-03-07 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697668#comment-17697668
 ] 

Julian Hyde commented on CALCITE-5563:
--

It makes sense, but you need to provide a summary that is suitable for the 
release notes.

> Add a break to the inner loop of RelSubset#getParents and 
> RelSubset#getParentSubsets after we find a matching Relsubset from its parent 
> input
> -
>
> Key: CALCITE-5563
> URL: https://issues.apache.org/jira/browse/CALCITE-5563
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: asdfgh19
>Assignee: asdfgh19
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.34.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> /**
>  * Returns the collection of RelNodes one of whose inputs is in this
>  * subset.
>  */
> Set getParents() {
>   final Set list = new LinkedHashSet<>();
>   for (RelNode parent : set.getParentRels()) {
> for (RelSubset rel : inputSubsets(parent)) {
>   // see usage of this method in propagateCostImprovements0()
>   if (rel == this) {
> list.add(parent);
> break;
>   }
> }
>   }
>   return list;
> }
> /**
>  * Returns the collection of distinct subsets that contain a RelNode one
>  * of whose inputs is in this subset.
>  */
> Set getParentSubsets(VolcanoPlanner planner) {
>   final Set list = new LinkedHashSet<>();
>   for (RelNode parent : set.getParentRels()) {
> for (RelSubset rel : inputSubsets(parent)) {
>   if (rel.set == set && rel.getTraitSet().equals(traitSet)) {
> list.add(planner.getSubsetNonNull(parent));
> break;
>   }
> }
>   }
>   return list;
> }{code}
>  
> Once we have found a matching Relsubset from its parent input, the subsequent 
> inner loop is unnecessary，so we can immediately end the inner loop to save a 
> little time. This is just a minor improment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5552) Returned timestamp is incorrect after the 100th row

2023-03-07 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697667#comment-17697667
 ] 

Julian Hyde commented on CALCITE-5552:
--

I agree. CALCITE-5488 is related. My hunch is that we should do this first, and 
then attack 5488 with what we learn from that fix. We need to answer the 
fundamental question "Do I need to run an Avatica server in UTC time zone? If 
not, what should be its behavior?"

> Returned timestamp is incorrect after the 100th row
> ---
>
> Key: CALCITE-5552
> URL: https://issues.apache.org/jira/browse/CALCITE-5552
> Project: Calcite
>  Issue Type: Bug
>  Components: avatica
>Affects Versions: avatica-1.23.0
>Reporter: Magnus Mogren
>Priority: Critical
> Attachments: TSTAMPS.csv
>
>
> When fetching data that contains timestamps the returned timestamp after row 
> 100 is incorrect.
> This can be reproduced using the CSV adapter (calcite-csv) using the 
> TSTAMPS.csv file attached. It contains 101 timestamps with the value 
> 1900-01-01 00:00:00. The first 100 is returned correctly, but number 101 in 
> the result has the value 1899-12-31 23:00:00 instead.
> Marking this bug as critical since not beeing able to trust the values 
> returned by calcite is as bad as it gets in my opinion.
> I do not know if the bug is in calcite or avatica.
> I have created a project that reproduces the issue. You can find that here: 
> [nytro77/calcite-timestamp-bug: Showcase bug in Calcite or Avatica that 
> causes faulty timestamps to be returned 
> (github.com)|https://github.com/nytro77/calcite-timestamp-bug]
> It starts an AvaticaServer that serves the attached file as a table using 
> calcite-csv and a unit tests that connects to the server and fetches the data.
> Run it with 
> {code:java}
> ./mvnw test{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5563) Add a break to the inner loop of RelSubset#getParents and RelSubset#getParentSubsets after we find a matching Relsubset from its parent input

2023-03-07 Thread asdfgh19 (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697663#comment-17697663
 ] 

asdfgh19 commented on CALCITE-5563:
---

[~julianhyde] It is not a bug, just a minor improvement to save a little time 
when there are multiple child nodes in the parent node. If it doesn't make any 
sense, I will close it. Thanks!

> Add a break to the inner loop of RelSubset#getParents and 
> RelSubset#getParentSubsets after we find a matching Relsubset from its parent 
> input
> -
>
> Key: CALCITE-5563
> URL: https://issues.apache.org/jira/browse/CALCITE-5563
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: asdfgh19
>Assignee: asdfgh19
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.34.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> /**
>  * Returns the collection of RelNodes one of whose inputs is in this
>  * subset.
>  */
> Set getParents() {
>   final Set list = new LinkedHashSet<>();
>   for (RelNode parent : set.getParentRels()) {
> for (RelSubset rel : inputSubsets(parent)) {
>   // see usage of this method in propagateCostImprovements0()
>   if (rel == this) {
> list.add(parent);
> break;
>   }
> }
>   }
>   return list;
> }
> /**
>  * Returns the collection of distinct subsets that contain a RelNode one
>  * of whose inputs is in this subset.
>  */
> Set getParentSubsets(VolcanoPlanner planner) {
>   final Set list = new LinkedHashSet<>();
>   for (RelNode parent : set.getParentRels()) {
> for (RelSubset rel : inputSubsets(parent)) {
>   if (rel.set == set && rel.getTraitSet().equals(traitSet)) {
> list.add(planner.getSubsetNonNull(parent));
> break;
>   }
> }
>   }
>   return list;
> }{code}
>  
> Once we have found a matching Relsubset from its parent input, the subsequent 
> inner loop is unnecessary，so we can immediately end the inner loop to save a 
> little time. This is just a minor improment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-5563) Add a break to the inner loop of RelSubset#getParents and RelSubset#getParentSubsets after we find a matching Relsubset from its parent input

2023-03-07 Thread asdfgh19 (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

asdfgh19 updated CALCITE-5563:
--
Description: 
 
{code:java}
/**
 * Returns the collection of RelNodes one of whose inputs is in this
 * subset.
 */
Set getParents() {
  final Set list = new LinkedHashSet<>();
  for (RelNode parent : set.getParentRels()) {
for (RelSubset rel : inputSubsets(parent)) {
  // see usage of this method in propagateCostImprovements0()
  if (rel == this) {
list.add(parent);
break;
  }
}
  }
  return list;
}

/**
 * Returns the collection of distinct subsets that contain a RelNode one
 * of whose inputs is in this subset.
 */
Set getParentSubsets(VolcanoPlanner planner) {
  final Set list = new LinkedHashSet<>();
  for (RelNode parent : set.getParentRels()) {
for (RelSubset rel : inputSubsets(parent)) {
  if (rel.set == set && rel.getTraitSet().equals(traitSet)) {
list.add(planner.getSubsetNonNull(parent));
break;
  }
}
  }
  return list;
}{code}
 

Once we have found a matching Relsubset from its parent input, the subsequent 
inner loop is unnecessary，so we can immediately end the inner loop to save a 
little time. This is just a minor improment.

  was:
 
{code:java}
/**
 * Returns the collection of RelNodes one of whose inputs is in this
 * subset.
 */
Set getParents() {
  final Set list = new LinkedHashSet<>();
  for (RelNode parent : set.getParentRels()) {
for (RelSubset rel : inputSubsets(parent)) {
  // see usage of this method in propagateCostImprovements0()
  if (rel == this) {
list.add(parent);
break;
  }
}
  }
  return list;
}

/**
 * Returns the collection of distinct subsets that contain a RelNode one
 * of whose inputs is in this subset.
 */
Set getParentSubsets(VolcanoPlanner planner) {
  final Set list = new LinkedHashSet<>();
  for (RelNode parent : set.getParentRels()) {
for (RelSubset rel : inputSubsets(parent)) {
  if (rel.set == set && rel.getTraitSet().equals(traitSet)) {
list.add(planner.getSubsetNonNull(parent));
break;
  }
}
  }
  return list;
}{code}
 

Once we have found a matching Relsubset from its parent input, the subsequent 
inner loop is unnecessary，so we can immediately end the inner loop.to save a 
little time. This is just a minor improment.


> Add a break to the inner loop of RelSubset#getParents and 
> RelSubset#getParentSubsets after we find a matching Relsubset from its parent 
> input
> -
>
> Key: CALCITE-5563
> URL: https://issues.apache.org/jira/browse/CALCITE-5563
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: asdfgh19
>Assignee: asdfgh19
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.34.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> /**
>  * Returns the collection of RelNodes one of whose inputs is in this
>  * subset.
>  */
> Set getParents() {
>   final Set list = new LinkedHashSet<>();
>   for (RelNode parent : set.getParentRels()) {
> for (RelSubset rel : inputSubsets(parent)) {
>   // see usage of this method in propagateCostImprovements0()
>   if (rel == this) {
> list.add(parent);
> break;
>   }
> }
>   }
>   return list;
> }
> /**
>  * Returns the collection of distinct subsets that contain a RelNode one
>  * of whose inputs is in this subset.
>  */
> Set getParentSubsets(VolcanoPlanner planner) {
>   final Set list = new LinkedHashSet<>();
>   for (RelNode parent : set.getParentRels()) {
> for (RelSubset rel : inputSubsets(parent)) {
>   if (rel.set == set && rel.getTraitSet().equals(traitSet)) {
> list.add(planner.getSubsetNonNull(parent));
> break;
>   }
> }
>   }
>   return list;
> }{code}
>  
> Once we have found a matching Relsubset from its parent input, the subsequent 
> inner loop is unnecessary，so we can immediately end the inner loop to save a 
> little time. This is just a minor improment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-5563) Add a break to the inner loop of RelSubset#getParents and RelSubset#getParentSubsets after we find a matching Relsubset from its parent input

2023-03-07 Thread asdfgh19 (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

asdfgh19 updated CALCITE-5563:
--
Description: 
 
{code:java}
/**
 * Returns the collection of RelNodes one of whose inputs is in this
 * subset.
 */
Set getParents() {
  final Set list = new LinkedHashSet<>();
  for (RelNode parent : set.getParentRels()) {
for (RelSubset rel : inputSubsets(parent)) {
  // see usage of this method in propagateCostImprovements0()
  if (rel == this) {
list.add(parent);
break;
  }
}
  }
  return list;
}

/**
 * Returns the collection of distinct subsets that contain a RelNode one
 * of whose inputs is in this subset.
 */
Set getParentSubsets(VolcanoPlanner planner) {
  final Set list = new LinkedHashSet<>();
  for (RelNode parent : set.getParentRels()) {
for (RelSubset rel : inputSubsets(parent)) {
  if (rel.set == set && rel.getTraitSet().equals(traitSet)) {
list.add(planner.getSubsetNonNull(parent));
break;
  }
}
  }
  return list;
}{code}
 

Once we have found a matching Relsubset from its parent input, the subsequent 
inner loop is unnecessary，so we can immediately end the inner loop.to save a 
little time. This is just a minor improment.

  was:
 
{code:java}
/**
 * Returns the collection of RelNodes one of whose inputs is in this
 * subset.
 */
Set getParents() {
  final Set list = new LinkedHashSet<>();
  for (RelNode parent : set.getParentRels()) {
for (RelSubset rel : inputSubsets(parent)) {
  // see usage of this method in propagateCostImprovements0()
  if (rel == this) {
list.add(parent);
break;
  }
}
  }
  return list;
}

/**
 * Returns the collection of distinct subsets that contain a RelNode one
 * of whose inputs is in this subset.
 */
Set getParentSubsets(VolcanoPlanner planner) {
  final Set list = new LinkedHashSet<>();
  for (RelNode parent : set.getParentRels()) {
for (RelSubset rel : inputSubsets(parent)) {
  if (rel.set == set && rel.getTraitSet().equals(traitSet)) {
list.add(planner.getSubsetNonNull(parent));
break;
  }
}
  }
  return list;
}{code}
 

Once we have found a matching Relsubset from its parent input, we can 
immediately end the inner loop.


> Add a break to the inner loop of RelSubset#getParents and 
> RelSubset#getParentSubsets after we find a matching Relsubset from its parent 
> input
> -
>
> Key: CALCITE-5563
> URL: https://issues.apache.org/jira/browse/CALCITE-5563
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: asdfgh19
>Assignee: asdfgh19
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.34.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> /**
>  * Returns the collection of RelNodes one of whose inputs is in this
>  * subset.
>  */
> Set getParents() {
>   final Set list = new LinkedHashSet<>();
>   for (RelNode parent : set.getParentRels()) {
> for (RelSubset rel : inputSubsets(parent)) {
>   // see usage of this method in propagateCostImprovements0()
>   if (rel == this) {
> list.add(parent);
> break;
>   }
> }
>   }
>   return list;
> }
> /**
>  * Returns the collection of distinct subsets that contain a RelNode one
>  * of whose inputs is in this subset.
>  */
> Set getParentSubsets(VolcanoPlanner planner) {
>   final Set list = new LinkedHashSet<>();
>   for (RelNode parent : set.getParentRels()) {
> for (RelSubset rel : inputSubsets(parent)) {
>   if (rel.set == set && rel.getTraitSet().equals(traitSet)) {
> list.add(planner.getSubsetNonNull(parent));
> break;
>   }
> }
>   }
>   return list;
> }{code}
>  
> Once we have found a matching Relsubset from its parent input, the subsequent 
> inner loop is unnecessary，so we can immediately end the inner loop.to save a 
> little time. This is just a minor improment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5552) Returned timestamp is incorrect after the 100th row

2023-03-07 Thread Will Noble (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697659#comment-17697659
 ] 

Will Noble commented on CALCITE-5552:
-

Sounds like this is similar in nature to, though probably distinct from, 
CALCITE-5488 (just my initial hunch)

> Returned timestamp is incorrect after the 100th row
> ---
>
> Key: CALCITE-5552
> URL: https://issues.apache.org/jira/browse/CALCITE-5552
> Project: Calcite
>  Issue Type: Bug
>  Components: avatica
>Affects Versions: avatica-1.23.0
>Reporter: Magnus Mogren
>Priority: Critical
> Attachments: TSTAMPS.csv
>
>
> When fetching data that contains timestamps the returned timestamp after row 
> 100 is incorrect.
> This can be reproduced using the CSV adapter (calcite-csv) using the 
> TSTAMPS.csv file attached. It contains 101 timestamps with the value 
> 1900-01-01 00:00:00. The first 100 is returned correctly, but number 101 in 
> the result has the value 1899-12-31 23:00:00 instead.
> Marking this bug as critical since not beeing able to trust the values 
> returned by calcite is as bad as it gets in my opinion.
> I do not know if the bug is in calcite or avatica.
> I have created a project that reproduces the issue. You can find that here: 
> [nytro77/calcite-timestamp-bug: Showcase bug in Calcite or Avatica that 
> causes faulty timestamps to be returned 
> (github.com)|https://github.com/nytro77/calcite-timestamp-bug]
> It starts an AvaticaServer that serves the attached file as a table using 
> calcite-csv and a unit tests that connects to the server and fetches the data.
> Run it with 
> {code:java}
> ./mvnw test{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5559) Improve RepeatUnion by discarding duplicates at TableSpool level

2023-03-07 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697593#comment-17697593
 ] 

Julian Hyde commented on CALCITE-5559:
--

Sorry, coming late to this, and haven't read everything, but let me just check 
some things at a high level. RepeatUnion implements seminaive evaluation, 
right? And that means detecting whether this iteration found some values that 
previous iterations did not. That is, we compute the difference. If the 
semantics of the particular query allows this do be done using set-difference 
(as opposed to multiset-difference) then this would often (maybe always) seem 
to be a win.

My hunch is that there should be an 'eliminate duplicates' flag that applies to 
the deltas. Maybe it is the same flag that applies to the collections of 
records at each iteration, or maybe it is a different flag.

> Improve RepeatUnion by discarding duplicates at TableSpool level
> 
>
> Key: CALCITE-5559
> URL: https://issues.apache.org/jira/browse/CALCITE-5559
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Ruben Q L
>Assignee: Ruben Q L
>Priority: Major
>
> Currently, RepeatUnion operator with all=false keeps track of the elements 
> that it has returned in order to discard duplicates. However, the TableSpool 
> operators that are right below it do not have such control. In certain 
> scenarios, duplicates are returned by the TableSpool current iteration, 
> discarded by the RepeatUnion, but have been already "fed back" by the 
> TableSpool into the next iteration, causing unnecessary processing.
> We can optimize this scenario by keeping track of the duplicates 
> inside/before the TableSpool too (note: we still need to keep track of 
> duplicates at RepeatUnion level, because that is the only place where we can 
> detect a potential "global duplicate" of an element: returned by the LHS and 
> then also by the RHS, or by two different iterations of the RHS).
> A PoC testing this improvement on a downstream project showed that certain 
> queries can go from ~40s down to ~1s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5558) avatica protobuf update_count and other signed ints should not be unsigned

2023-03-07 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697586#comment-17697586
 ] 

Julian Hyde commented on CALCITE-5558:
--

Would this be a breaking change?

Update count never needs to be negative (except perhaps to store a 'not 
specified' value), so I don't feel strongly about whether it is stored in a 
signed or unsigned. I do acknowledge that it maps to a Java {{long}} (64 bit 
signed integer).

> avatica protobuf update_count and other signed ints should not be unsigned
> --
>
> Key: CALCITE-5558
> URL: https://issues.apache.org/jira/browse/CALCITE-5558
> Project: Calcite
>  Issue Type: Bug
>  Components: avatica
>Affects Versions: avatica-1.23.0
>Reporter: Martin Jonsson
>Priority: Minor
>
> the update_count field of result set response in avatica is defined as uint64 
> which by default takes negative value -1.
> The protobuf specification states that negative values should use int or 
> sint, not uint, since uint is... unsigned. For this reason many protobuf 
> implementations can't handle uints that are negative. This might not be a 
> terrible if you are building a client but I'm building a server based on 
> avatica protobuf protocol and I would like it to work not just with standard 
> java implementation and c++ implementation which seems to be the ones that 
> handles this odd case.
> Would it be possible to change update_count from uint64 to int64? and same 
> for all other possible negative ints currently defined as unsigned?
>  
> Many thanks
> Martin 
>   



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5557) Add SAFE_CAST (BigQuery compatibility)

2023-03-07 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697581#comment-17697581
 ] 

Julian Hyde commented on CALCITE-5557:
--

I don't know. It's a judgment call as to what causes the least amount of chaos 
in the RexToLixTranslator. It's possible that you need a toIntSafe method (and 
corresponding methods for every data type), but that seems like a lot to 
maintain (because there are a lot of data types). It's possible that {{toInt}} 
could somehow declare the exceptions that it might throw and RexToLixTranslator 
could generate a "catch (Ex1 | Ex2) \{ return null; }" clause.

Take the time to understand how {{implementSafe}} works, and how we can use 
methods with primitive arguments to implement SQL calls with nullable 
arguments. (Note that "safe" in that context means something different.) Maybe 
there is a similar trick that can be used here.

> Add SAFE_CAST (BigQuery compatibility)
> --
>
> Key: CALCITE-5557
> URL: https://issues.apache.org/jira/browse/CALCITE-5557
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Oliver Lee
>Assignee: Oliver Lee
>Priority: Minor
>
> Implement SAFE_CAST per BigQuery specifications
> [https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#safe_casting]
> Equivalent to CAST(), except it will return {{null}} instead of throwing an 
> exception if the casting fails.
> {quote}SAFE_CAST('1' as INT) -> 1
> SAFE_CAST('a' as INT) -> NULL
> SAFE_CAST("2022-12-12" as DATE) -> DATE('2022-12-12')
> SAFE_CAST(1 AS BOOLEAN) -> true
> SAFE_CAST('A' AS BOOLEAN) -> null
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5554) Add DAYOFWEEK and DAYOFYEAR as valid synonyms of DOW, DOY

2023-03-07 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697578#comment-17697578
 ] 

Julian Hyde commented on CALCITE-5554:
--

That sounds reasonable, given that {{EXTRACT}} is implemented differently from 
other 

I see that [DATE_PART on 
Snowflake|https://docs.snowflake.com/en/sql-reference/functions-date-time#label-supported-date-time-parts]
 allows weekday, dow, dw as synonyms for dayofweek, and yearday, doy, dy as 
synonyms for dayofyear. So, if we revisit {{DATE_PART}} we can also revisit 
{{EXTRACT}}.

Re. enabling tests. The point of CALCITE-2539 is that {{checkFails}} is a 
no-op. (I.e. the test will pass even if the expression does not fail.) So some 
of those tests would actually succeed if you change {{checkFails}} to an 
appropriate {{checkScalar}}. Change as many as you can to {{checkScalar}}, and 
leave the other {{checkFails}} inside an {{if}}.

> Add DAYOFWEEK and DAYOFYEAR as valid synonyms of DOW, DOY
> -
>
> Key: CALCITE-5554
> URL: https://issues.apache.org/jira/browse/CALCITE-5554
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Tanner Clary
>Assignee: Tanner Clary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently, a query such as: {{EXTRACT(DAYOFWEEK FROM DATE '2008-12-25');}} is 
> invalid while {{EXTRACT(DOW FROM DATE '2008-12-25')}} is acceptable. BigQuery 
> (and perhaps other dialects) accept the non-abbreviated version as a valid 
> time unit. After this change, DAYOFWEEK and DAYOFYEAR would both be synonyms 
> of DOW and DOY, respectively. 
> [Relevant BigQuery 
> Docs|https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions#extract]
> Any comments, questions, or suggestions for the linked PR are highly 
> appreciated. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (CALCITE-5562) The result data is disordered when u tried to get data from an ordered set

2023-03-07 Thread Julian Hyde (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde closed CALCITE-5562.

Resolution: Not A Bug

Not a bug. ORDER BY in a subquery is ignored.

This seems to be a question. In future please ask questions on the dev list.

> The result data is disordered when u tried to get data from an ordered set
> --
>
> Key: CALCITE-5562
> URL: https://issues.apache.org/jira/browse/CALCITE-5562
> Project: Calcite
>  Issue Type: Bug
>Reporter: Yuxin Wu
>Priority: Major
>
> *It's based on CSV-TEST-MODEL. The same as {{{}lex=MYSQL{}}}.*
> {quote}!connect jdbc:calcite:model=src/test/resources/mode
> {quote}
> 
> Step1: You need to get a table named 'emps' with data as below;
> {panel}
> 0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps;
> |NAME|EMPID|
> |Fred|30  |
> |Eric|3    |
> |John|2    |
> |Wilma|1    |
> |Alice|2    |
>  
> {panel}
> Step2: Get an ordered set by using {_}{color:#172b4d}*{{select name, empid 
> from emps order by empid}}*{color}{_};
> {panel}
> 0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps 
> order by empid;
> |NAME|EMPID|
> |Wilma|1    |
> |John|2    |
> |Alice|2    |
> |Eric|3    |
> |Fred|30  |
> {panel}
>  
> Step3: Using the statement in step2 as a sub query like this:
> {panel}
> 0: jdbc:calcite:model=src/test/resources/mode> select * from (select name, 
> empid from emps order by empid);
> |NAME|EMPID|
> |Fred|30  |
> |Eric|3    |
> |John|2    |
> |Wilma|1    |
> |Alice|2    |
> {panel}
>  
> *THE QUESTION IS:*
> *The result data is disordered while it's supposed to be the same as it is in 
> step2.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-5564) Support 2-argument PERCENTILE_CONT, PERCENTILE_DISC aggregate functions (as in BigQuery)

2023-03-07 Thread Julian Hyde (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated CALCITE-5564:
-
Summary: Support 2-argument PERCENTILE_CONT, PERCENTILE_DISC aggregate 
functions (as in BigQuery)  (was: Add 2-argument support for 
PERCENTILE_CONT/DISC)

> Support 2-argument PERCENTILE_CONT, PERCENTILE_DISC aggregate functions (as 
> in BigQuery)
> 
>
> Key: CALCITE-5564
> URL: https://issues.apache.org/jira/browse/CALCITE-5564
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Tanner Clary
>Assignee: Tanner Clary
>Priority: Major
>
> Calcite currently has implementations for the {{PERCENTILE_CONT}} and 
> {{PERCENTILE_DISC}} functions. Their syntax may be found 
> [here|https://learn.microsoft.com/en-us/sql/t-sql/functions/percentile-cont-transact-sql?view=sql-server-ver16].
>  
> BigQuery offers these functions as well, but the syntax is slightly 
> different, and may be found 
> [here|https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#percentile_cont].
>  The main difference is that instead of using a {{WITHIN GROUP}} clause, the 
> array is passed in directly as the first argument to the function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5563) Add a break to the inner loop of RelSubset#getParents and RelSubset#getParentSubsets after we find a matching Relsubset from its parent input

2023-03-07 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697573#comment-17697573
 ] 

Julian Hyde commented on CALCITE-5563:
--

What's the bug? Change your summary to describe the problem, not the solution.

> Add a break to the inner loop of RelSubset#getParents and 
> RelSubset#getParentSubsets after we find a matching Relsubset from its parent 
> input
> -
>
> Key: CALCITE-5563
> URL: https://issues.apache.org/jira/browse/CALCITE-5563
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: asdfgh19
>Assignee: asdfgh19
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.34.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> /**
>  * Returns the collection of RelNodes one of whose inputs is in this
>  * subset.
>  */
> Set getParents() {
>   final Set list = new LinkedHashSet<>();
>   for (RelNode parent : set.getParentRels()) {
> for (RelSubset rel : inputSubsets(parent)) {
>   // see usage of this method in propagateCostImprovements0()
>   if (rel == this) {
> list.add(parent);
> break;
>   }
> }
>   }
>   return list;
> }
> /**
>  * Returns the collection of distinct subsets that contain a RelNode one
>  * of whose inputs is in this subset.
>  */
> Set getParentSubsets(VolcanoPlanner planner) {
>   final Set list = new LinkedHashSet<>();
>   for (RelNode parent : set.getParentRels()) {
> for (RelSubset rel : inputSubsets(parent)) {
>   if (rel.set == set && rel.getTraitSet().equals(traitSet)) {
> list.add(planner.getSubsetNonNull(parent));
> break;
>   }
> }
>   }
>   return list;
> }{code}
>  
> Once we have found a matching Relsubset from its parent input, we can 
> immediately end the inner loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5564) Add 2-argument support for PERCENTILE_CONT/DISC

2023-03-07 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697571#comment-17697571
 ] 

Julian Hyde commented on CALCITE-5564:
--

It seems to me that the main difference is that BigQuery's {{PERCENTILE_CONT}} 
(and \{{PERCENTILE_DISC}}) uses the {{OVER}} clause where the standard (e.g. 
Postgres) version uses the {{{}WITHIN GROUP{}}}. There are a few implications:
 * The BigQuery version looks like a windowed aggregate function even when 
you're using it as an aggregate function (e.g. {{{}SELECT x, PERCENTILE_CONT(y, 
PERCENTILE 50) OVER (ORDER BY z) FROM t GROUP BY a{}}})
 * We'll need to be careful how we determine whether a query is an aggregate 
query. Is {{SELECT x, PERCENTILE_CONT(y, PERCENTILE 50) OVER (ORDER BY z) FROM 
t}} an aggregate query? In Postgres no, in BigQuery maybe?
 * In Postgres and Calcite) I suspect that it is valid (and makes sense) to 
have both an OVER and a WITHIN GROUP. For example, {{select 
percentile_cont(0.6) within group (order by sal) over (order by hiredate 
partition by deptno rows 2 preceding) from emp}} (there are queries similar to 
this in 
[redshift.iq|https://github.com/apache/calcite/blob/main/babel/src/test/resources/sql/redshift.iq]).

I think you need to explore the semantics in BigQuery and Postgres. Are there 
any queries that are valid in both, and have different semantics?

> Add 2-argument support for PERCENTILE_CONT/DISC
> ---
>
> Key: CALCITE-5564
> URL: https://issues.apache.org/jira/browse/CALCITE-5564
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Tanner Clary
>Assignee: Tanner Clary
>Priority: Major
>
> Calcite currently has implementations for the {{PERCENTILE_CONT}} and 
> {{PERCENTILE_DISC}} functions. Their syntax may be found 
> [here|https://learn.microsoft.com/en-us/sql/t-sql/functions/percentile-cont-transact-sql?view=sql-server-ver16].
>  
> BigQuery offers these functions as well, but the syntax is slightly 
> different, and may be found 
> [here|https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#percentile_cont].
>  The main difference is that instead of using a {{WITHIN GROUP}} clause, the 
> array is passed in directly as the first argument to the function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (CALCITE-5564) Add 2-argument support for PERCENTILE_CONT/DISC

2023-03-07 Thread Tanner Clary (Jira)

Tanner Clary created CALCITE-5564:
-

 Summary: Add 2-argument support for PERCENTILE_CONT/DISC
 Key: CALCITE-5564
 URL: https://issues.apache.org/jira/browse/CALCITE-5564
 Project: Calcite
  Issue Type: Improvement
Reporter: Tanner Clary
Assignee: Tanner Clary


Calcite currently has implementations for the {{PERCENTILE_CONT}} and 
{{PERCENTILE_DISC}} functions. Their syntax may be found 
[here|https://learn.microsoft.com/en-us/sql/t-sql/functions/percentile-cont-transact-sql?view=sql-server-ver16].
 

BigQuery offers these functions as well, but the syntax is slightly different, 
and may be found 
[here|https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#percentile_cont].
 The main difference is that instead of using a {{WITHIN GROUP}} clause, the 
array is passed in directly as the first argument to the function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-5390) RelDecorrelator throws NullPointerException

2023-03-07 Thread Sergey Nuyanzin (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Nuyanzin updated CALCITE-5390:
-
Description: 
The current query throws NullPointerException
{code:java}
SELECT
  (SELECT 1 FROM emp d WHERE d.job = a.job LIMIT 1) AS t1,
  (SELECT a.job = 'PRESIDENT' FROM emp s LIMIT 1) as t2
FROM emp a;
{code}
Test case - 
[https://github.com/apache/calcite/commit/46fe9bc456f2d34cf7dccd29829c9e85abe69d5f]

Logical plan before it fails:
{code:java}
LogicalProject(T1=[$8], T2=[$9])
  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], $f0=[$8], $f09=[$9])
    LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], $f0=[$8], $f00=[$10])
      LogicalCorrelate(correlation=[$cor0], joinType=[left], 
requiredColumns=[{9}])
        LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], 
HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], $f0=[$8], $f9=[=($2, 
'PRESIDENT')])
          LogicalCorrelate(correlation=[$cor0], joinType=[left], 
requiredColumns=[{2}])
            LogicalTableScan(table=[[scott, EMP]])
            LogicalAggregate(group=[{}], agg#0=[SINGLE_VALUE($0)])
              LogicalSort(fetch=[1])
                LogicalProject(EXPR$0=[1])
                  LogicalFilter(condition=[=($2, $cor0.JOB)])
                    LogicalTableScan(table=[[scott, EMP]])
        LogicalAggregate(group=[{}], agg#0=[SINGLE_VALUE($0)])
          LogicalSort(fetch=[1])
            LogicalProject(EXPR$0=[$cor0.$f9])
              LogicalTableScan(table=[[scott, EMP]]) {code}
Stack trace:
{code:java}
 Caused by: java.lang.NullPointerException
at java.util.Objects.requireNonNull(Objects.java:203)
at 
org.apache.calcite.sql2rel.RelDecorrelator.createValueGenerator(RelDecorrelator.java:833)
at 
org.apache.calcite.sql2rel.RelDecorrelator.decorrelateInputWithValueGenerator(RelDecorrelator.java:1028)
at 
org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:764)
at 
org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:738)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
at 
org.apache.calcite.sql2rel.RelDecorrelator.getInvoke(RelDecorrelator.java:707)
at 
org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:464)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
at 
org.apache.calcite.sql2rel.RelDecorrelator.getInvoke(RelDecorrelator.java:707)
at 
org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:512)
at 
org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:495)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
at 
org.apache.calcite.sql2rel.RelDecorrelator.getInvoke(RelDecorrelator.java:707)
at 
org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:1187)
at 
org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:1169)
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
at 
org.apache.calcite.sql2rel.RelDecorrelator.getInvoke(RelDecorrelator.java:707)
at 
org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:749)
at 
org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:738)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
at

[jira] [Commented] (CALCITE-5390) RelDecorrelator throws NullPointerException

2023-03-07 Thread Sergey Nuyanzin (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697487#comment-17697487
 ] 

Sergey Nuyanzin commented on CALCITE-5390:
--

Thanks for the quick response.

I will try to see how hard is to handle this...

Here are some more findings:

there are 2 more queries failing with same exception (I put them in description 
as well)
{code:sql}
SELECT * FROM emps e WHERE e.name NOT IN (SELECT d.name FROM depts d WHERE 
e.deptno = d.deptno OR e.gender ='M');

SELECT city 
  FROM emps e 
WHERE ((CASE WHEN name NOT IN (SELECT name FROM sdepts s WHERE s.deptno = 
e.deptno) THEN '1' ELSE  '2' END) 
 NOT IN (SELECT name FROM depts d WHERE e.deptno = d.deptno));
{code}

Also I noticed that these queries starts failing with NPE with this change 
https://issues.apache.org/jira/browse/CALCITE-4560.
In a commit before that change there is no such issue

> RelDecorrelator throws NullPointerException
> ---
>
> Key: CALCITE-5390
> URL: https://issues.apache.org/jira/browse/CALCITE-5390
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Dmitry Sysolyatin
>Assignee: Zou
>Priority: Major
>
> The current query throws NullPointerException
> {code:java}
> SELECT
>   (SELECT 1 FROM emp d WHERE d.job = a.job LIMIT 1) AS t1,
>   (SELECT a.job = 'PRESIDENT' FROM emp s LIMIT 1) as t2
> FROM emp a;
> {code}
> Test case - 
> [https://github.com/apache/calcite/commit/46fe9bc456f2d34cf7dccd29829c9e85abe69d5f]
> Logical plan before it fails:
> {code:java}
> LogicalProject(T1=[$8], T2=[$9])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], $f0=[$8], $f09=[$9])
>     LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], $f0=[$8], $f00=[$10])
>       LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{9}])
>         LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], 
> HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], $f0=[$8], $f9=[=($2, 
> 'PRESIDENT')])
>           LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{2}])
>             LogicalTableScan(table=[[scott, EMP]])
>             LogicalAggregate(group=[{}], agg#0=[SINGLE_VALUE($0)])
>               LogicalSort(fetch=[1])
>                 LogicalProject(EXPR$0=[1])
>                   LogicalFilter(condition=[=($2, $cor0.JOB)])
>                     LogicalTableScan(table=[[scott, EMP]])
>         LogicalAggregate(group=[{}], agg#0=[SINGLE_VALUE($0)])
>           LogicalSort(fetch=[1])
>             LogicalProject(EXPR$0=[$cor0.$f9])
>               LogicalTableScan(table=[[scott, EMP]]) {code}
> Stack trace:
> {code:java}
>  Caused by: java.lang.NullPointerException
>   at java.util.Objects.requireNonNull(Objects.java:203)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.createValueGenerator(RelDecorrelator.java:833)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateInputWithValueGenerator(RelDecorrelator.java:1028)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:764)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:738)
>   at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.getInvoke(RelDecorrelator.java:707)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:464)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.getInvoke(RelDecorrelator.java:707)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:512)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:495)
>   at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
>   at 
>

[jira] [Updated] (CALCITE-5563) Add a break to the inner loop of RelSubset#getParents and RelSubset#getParentSubsets after we find a matching Relsubset from its parent input

2023-03-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CALCITE-5563:

Labels: pull-request-available  (was: )

> Add a break to the inner loop of RelSubset#getParents and 
> RelSubset#getParentSubsets after we find a matching Relsubset from its parent 
> input
> -
>
> Key: CALCITE-5563
> URL: https://issues.apache.org/jira/browse/CALCITE-5563
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: asdfgh19
>Assignee: asdfgh19
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.34.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> /**
>  * Returns the collection of RelNodes one of whose inputs is in this
>  * subset.
>  */
> Set getParents() {
>   final Set list = new LinkedHashSet<>();
>   for (RelNode parent : set.getParentRels()) {
> for (RelSubset rel : inputSubsets(parent)) {
>   // see usage of this method in propagateCostImprovements0()
>   if (rel == this) {
> list.add(parent);
> break;
>   }
> }
>   }
>   return list;
> }
> /**
>  * Returns the collection of distinct subsets that contain a RelNode one
>  * of whose inputs is in this subset.
>  */
> Set getParentSubsets(VolcanoPlanner planner) {
>   final Set list = new LinkedHashSet<>();
>   for (RelNode parent : set.getParentRels()) {
> for (RelSubset rel : inputSubsets(parent)) {
>   if (rel.set == set && rel.getTraitSet().equals(traitSet)) {
> list.add(planner.getSubsetNonNull(parent));
> break;
>   }
> }
>   }
>   return list;
> }{code}
>  
> Once we have found a matching Relsubset from its parent input, we can 
> immediately end the inner loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (CALCITE-5563) Add a break to the inner loop of RelSubset#getParents and RelSubset#getParentSubsets after we find a matching Relsubset from its parent input

2023-03-07 Thread asdfgh19 (Jira)

asdfgh19 created CALCITE-5563:
-

 Summary: Add a break to the inner loop of RelSubset#getParents and 
RelSubset#getParentSubsets after we find a matching Relsubset from its parent 
input
 Key: CALCITE-5563
 URL: https://issues.apache.org/jira/browse/CALCITE-5563
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: asdfgh19
Assignee: asdfgh19
 Fix For: 1.34.0


 
{code:java}
/**
 * Returns the collection of RelNodes one of whose inputs is in this
 * subset.
 */
Set getParents() {
  final Set list = new LinkedHashSet<>();
  for (RelNode parent : set.getParentRels()) {
for (RelSubset rel : inputSubsets(parent)) {
  // see usage of this method in propagateCostImprovements0()
  if (rel == this) {
list.add(parent);
break;
  }
}
  }
  return list;
}

/**
 * Returns the collection of distinct subsets that contain a RelNode one
 * of whose inputs is in this subset.
 */
Set getParentSubsets(VolcanoPlanner planner) {
  final Set list = new LinkedHashSet<>();
  for (RelNode parent : set.getParentRels()) {
for (RelSubset rel : inputSubsets(parent)) {
  if (rel.set == set && rel.getTraitSet().equals(traitSet)) {
list.add(planner.getSubsetNonNull(parent));
break;
  }
}
  }
  return list;
}{code}
 

Once we have found a matching Relsubset from its parent input, we can 
immediately end the inner loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5403) Babel parser should parse PostgreSQL's SET, RESET, BEGIN, SHOW, ROLLBACK, COMMIT commands

2023-03-07 Thread Dmitry Sysolyatin (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697403#comment-17697403
 ] 

Dmitry Sysolyatin commented on CALCITE-5403:


[~julianhyde] I have changed summary to "Babel parser should parse PostgreSQL's 
SET, RESET, BEGIN, SHOW, ROLLBACK, COMMIT commands". I hope the updated summary 
is good enough now

> Babel parser should parse PostgreSQL's SET, RESET, BEGIN, SHOW, ROLLBACK, 
> COMMIT commands
> -
>
> Key: CALCITE-5403
> URL: https://issues.apache.org/jira/browse/CALCITE-5403
> Project: Calcite
>  Issue Type: Bug
>  Components: babel
>Affects Versions: 1.34.0
>Reporter: Dmitry Sysolyatin
>Assignee: Dmitry Sysolyatin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Babel parser can not parse BEGIN [1], SHOW [2], ROLLBACK [3], COMMIT [4] 
> commands at the moment.
> It can parse SET [5] and RESET [6], but not completely. For instance, it can 
> not parse the following statement:
> {code:java}
> SET SCHEMA public,public,"$user"
> {code}
> The syntax for these commands is as follows:
>  * BEGIN [1]:
> {code:java}
> BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] 
> where transaction_mode is one of: ISOLATION LEVEL { SERIALIZABLE | REPEATABLE 
> READ | READ COMMITTED | READ UNCOMMITTED } READ WRITE | READ ONLY [ NOT ] 
> DEFERRABLE
> {code}
>  * SHOW [2]:
> {code:java}
> SHOW ( | )
> {code}
>  * ROLLBACK [3]:
> {code:java}
> ROLLBACK [ WORK | TRANSACTION ] [ AND [ NO ] CHAIN ]
> {code}
>  * COMMIT [4]:
> {code:java}
> COMMIT [ WORK | TRANSACTION ] [ AND [ NO ] CHAIN ]
> {code}
>  * SET [5]
> {code:java}
> SET [ SESSION | LOCAL ] configuration_parameter { TO | = } { value | 'value' 
> | DEFAULT }
> SET [ SESSION | LOCAL ] TIME ZONE { value | 'value' | LOCAL | DEFAULT }
> SET [ SESSION | LOCAL ]  { SCHEMA | SEED | NAMES } value
> Values can be specified as string constants, identifiers, numbers, or 
> comma-separated lists of these
> {code}
>  * RESET [6]
> {code:java}
> RESET ;
> {code}
> [1] [https://www.postgresql.org/docs/current/sql-begin.html]
> [2] [https://www.postgresql.org/docs/current/sql-show.html]
> [3] [https://www.postgresql.org/docs/current/sql-rollback.html]
> [4] [https://www.postgresql.org/docs/current/sql-commit.html]
> [5] [https://www.postgresql.org/docs/current/sql-set.html]
> [6] [https://www.postgresql.org/docs/current/sql-reset.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5390) RelDecorrelator throws NullPointerException

2023-03-07 Thread Zou (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697401#comment-17697401
 ] 

Zou commented on CALCITE-5390:
--

[~Sergey Nuyanzin] I am sorry that I haven't started this work yet as I am busy 
recently. You are free to take it if you are interested in it.

> RelDecorrelator throws NullPointerException
> ---
>
> Key: CALCITE-5390
> URL: https://issues.apache.org/jira/browse/CALCITE-5390
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Dmitry Sysolyatin
>Assignee: Zou
>Priority: Major
>
> The current query throws NullPointerException
> {code:java}
> SELECT
>   (SELECT 1 FROM emp d WHERE d.job = a.job LIMIT 1) AS t1,
>   (SELECT a.job = 'PRESIDENT' FROM emp s LIMIT 1) as t2
> FROM emp a;
> {code}
> Test case - 
> [https://github.com/apache/calcite/commit/46fe9bc456f2d34cf7dccd29829c9e85abe69d5f]
> Logical plan before it fails:
> {code:java}
> LogicalProject(T1=[$8], T2=[$9])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], $f0=[$8], $f09=[$9])
>     LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], $f0=[$8], $f00=[$10])
>       LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{9}])
>         LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], 
> HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], $f0=[$8], $f9=[=($2, 
> 'PRESIDENT')])
>           LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{2}])
>             LogicalTableScan(table=[[scott, EMP]])
>             LogicalAggregate(group=[{}], agg#0=[SINGLE_VALUE($0)])
>               LogicalSort(fetch=[1])
>                 LogicalProject(EXPR$0=[1])
>                   LogicalFilter(condition=[=($2, $cor0.JOB)])
>                     LogicalTableScan(table=[[scott, EMP]])
>         LogicalAggregate(group=[{}], agg#0=[SINGLE_VALUE($0)])
>           LogicalSort(fetch=[1])
>             LogicalProject(EXPR$0=[$cor0.$f9])
>               LogicalTableScan(table=[[scott, EMP]]) {code}
> Stack trace:
> {code:java}
>  Caused by: java.lang.NullPointerException
>   at java.util.Objects.requireNonNull(Objects.java:203)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.createValueGenerator(RelDecorrelator.java:833)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateInputWithValueGenerator(RelDecorrelator.java:1028)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:764)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:738)
>   at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.getInvoke(RelDecorrelator.java:707)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:464)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.getInvoke(RelDecorrelator.java:707)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:512)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:495)
>   at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.getInvoke(RelDecorrelator.java:707)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:1187)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:1169)
>   at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
>   at 
>

[jira] [Updated] (CALCITE-5403) Babel parser should parse PostgreSQL's SET, RESET, BEGIN, SHOW, ROLLBACK, COMMIT commands

2023-03-07 Thread Dmitry Sysolyatin (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sysolyatin updated CALCITE-5403:
---
Summary: Babel parser should parse PostgreSQL's SET, RESET, BEGIN, SHOW, 
ROLLBACK, COMMIT commands  (was: PostgreSQL dialect should support SET, RESET, 
BEGIN, SHOW, ROLLBACK, COMMIT)

> Babel parser should parse PostgreSQL's SET, RESET, BEGIN, SHOW, ROLLBACK, 
> COMMIT commands
> -
>
> Key: CALCITE-5403
> URL: https://issues.apache.org/jira/browse/CALCITE-5403
> Project: Calcite
>  Issue Type: Bug
>  Components: babel
>Affects Versions: 1.34.0
>Reporter: Dmitry Sysolyatin
>Assignee: Dmitry Sysolyatin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Babel parser can not parse BEGIN [1], SHOW [2], ROLLBACK [3], COMMIT [4] 
> commands at the moment.
> It can parse SET [5] and RESET [6], but not completely. For instance, it can 
> not parse the following statement:
> {code:java}
> SET SCHEMA public,public,"$user"
> {code}
> The syntax for these commands is as follows:
>  * BEGIN [1]:
> {code:java}
> BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] 
> where transaction_mode is one of: ISOLATION LEVEL { SERIALIZABLE | REPEATABLE 
> READ | READ COMMITTED | READ UNCOMMITTED } READ WRITE | READ ONLY [ NOT ] 
> DEFERRABLE
> {code}
>  * SHOW [2]:
> {code:java}
> SHOW ( | )
> {code}
>  * ROLLBACK [3]:
> {code:java}
> ROLLBACK [ WORK | TRANSACTION ] [ AND [ NO ] CHAIN ]
> {code}
>  * COMMIT [4]:
> {code:java}
> COMMIT [ WORK | TRANSACTION ] [ AND [ NO ] CHAIN ]
> {code}
>  * SET [5]
> {code:java}
> SET [ SESSION | LOCAL ] configuration_parameter { TO | = } { value | 'value' 
> | DEFAULT }
> SET [ SESSION | LOCAL ] TIME ZONE { value | 'value' | LOCAL | DEFAULT }
> SET [ SESSION | LOCAL ]  { SCHEMA | SEED | NAMES } value
> Values can be specified as string constants, identifiers, numbers, or 
> comma-separated lists of these
> {code}
>  * RESET [6]
> {code:java}
> RESET ;
> {code}
> [1] [https://www.postgresql.org/docs/current/sql-begin.html]
> [2] [https://www.postgresql.org/docs/current/sql-show.html]
> [3] [https://www.postgresql.org/docs/current/sql-rollback.html]
> [4] [https://www.postgresql.org/docs/current/sql-commit.html]
> [5] [https://www.postgresql.org/docs/current/sql-set.html]
> [6] [https://www.postgresql.org/docs/current/sql-reset.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5390) RelDecorrelator throws NullPointerException

2023-03-07 Thread Sergey Nuyanzin (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697394#comment-17697394
 ] 

Sergey Nuyanzin commented on CALCITE-5390:
--

Hi [~FrankZou] , do you have any plans to work on this issue or do you need 
some help?

I'm asking since also interested in this issue as faced it in Flink

> RelDecorrelator throws NullPointerException
> ---
>
> Key: CALCITE-5390
> URL: https://issues.apache.org/jira/browse/CALCITE-5390
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Dmitry Sysolyatin
>Assignee: Zou
>Priority: Major
>
> The current query throws NullPointerException
> {code:java}
> SELECT
>   (SELECT 1 FROM emp d WHERE d.job = a.job LIMIT 1) AS t1,
>   (SELECT a.job = 'PRESIDENT' FROM emp s LIMIT 1) as t2
> FROM emp a;
> {code}
> Test case - 
> [https://github.com/apache/calcite/commit/46fe9bc456f2d34cf7dccd29829c9e85abe69d5f]
> Logical plan before it fails:
> {code:java}
> LogicalProject(T1=[$8], T2=[$9])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], $f0=[$8], $f09=[$9])
>     LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], $f0=[$8], $f00=[$10])
>       LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{9}])
>         LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], 
> HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], $f0=[$8], $f9=[=($2, 
> 'PRESIDENT')])
>           LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{2}])
>             LogicalTableScan(table=[[scott, EMP]])
>             LogicalAggregate(group=[{}], agg#0=[SINGLE_VALUE($0)])
>               LogicalSort(fetch=[1])
>                 LogicalProject(EXPR$0=[1])
>                   LogicalFilter(condition=[=($2, $cor0.JOB)])
>                     LogicalTableScan(table=[[scott, EMP]])
>         LogicalAggregate(group=[{}], agg#0=[SINGLE_VALUE($0)])
>           LogicalSort(fetch=[1])
>             LogicalProject(EXPR$0=[$cor0.$f9])
>               LogicalTableScan(table=[[scott, EMP]]) {code}
> Stack trace:
> {code:java}
>  Caused by: java.lang.NullPointerException
>   at java.util.Objects.requireNonNull(Objects.java:203)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.createValueGenerator(RelDecorrelator.java:833)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateInputWithValueGenerator(RelDecorrelator.java:1028)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:764)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:738)
>   at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.getInvoke(RelDecorrelator.java:707)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:464)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.getInvoke(RelDecorrelator.java:707)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:512)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:495)
>   at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.getInvoke(RelDecorrelator.java:707)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:1187)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:1169)
>   at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at

[jira] [Commented] (CALCITE-5559) Improve RepeatUnion by discarding duplicates at TableSpool level

2023-03-07 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697334#comment-17697334
 ] 

Stamatis Zampetakis commented on CALCITE-5559:
--

I am leaning more towards *B* even though it looks a bit more complicated.

If I understood well  we cannot have an {{Aggregate}} below a {{TableSpool}} 
operator due to underlying limitations of the respective {{Enumerable}} 
operators. This also means that if we wanted to represent a recursive SQL query 
with aggregation this wouldn't work at this point in {{EnumerableConvention}}. 
Changing/Enhancing the physical implementation of {{Aggregate}} has the 
additional benefit that these kind of queries would be supported.

I know that many popular DBMS impose some limitations around the usage of GROUP 
BY, HAVING, DISTINCT, etc., in recursive queries. It may be worth trying to 
understand the reason behind these limitations before moving further with *B* 
to avoid hitting a wall later on. We don't really need to adhere or impose the 
same limitations with other DBMS, cause algebra is more powerful than SQL, but 
it may help in taking a more informed decision.

Regarding the eager vs. lazy evaluation of aggregations there is always the 
option of keeping both and let the optimizer/rules decide which one to peek. I 
prefer having to maintain a single implementation but if for some reason 
(performance, backwards compatibility, etc.) we want to keep both we could.

> Improve RepeatUnion by discarding duplicates at TableSpool level
> 
>
> Key: CALCITE-5559
> URL: https://issues.apache.org/jira/browse/CALCITE-5559
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Ruben Q L
>Assignee: Ruben Q L
>Priority: Major
>
> Currently, RepeatUnion operator with all=false keeps track of the elements 
> that it has returned in order to discard duplicates. However, the TableSpool 
> operators that are right below it do not have such control. In certain 
> scenarios, duplicates are returned by the TableSpool current iteration, 
> discarded by the RepeatUnion, but have been already "fed back" by the 
> TableSpool into the next iteration, causing unnecessary processing.
> We can optimize this scenario by keeping track of the duplicates 
> inside/before the TableSpool too (note: we still need to keep track of 
> duplicates at RepeatUnion level, because that is the only place where we can 
> detect a potential "global duplicate" of an element: returned by the LHS and 
> then also by the RHS, or by two different iterations of the RHS).
> A PoC testing this improvement on a downstream project showed that certain 
> queries can go from ~40s down to ~1s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-5562) The result data is disordered when u tried to get data from an ordered set

2023-03-07 Thread Yuxin Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxin Wu updated CALCITE-5562:
--
Description: 
*It's based on CSV-TEST-MODEL. The same as {{{}lex=MYSQL{}}}.*
{quote}!connect jdbc:calcite:model=src/test/resources/mode
{quote}

Step1: You need to get a table named 'emps' with data as below;
{panel}
0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps;
|NAME|EMPID|
|Fred|30  |
|Eric|3    |
|John|2    |
|Wilma|1    |
|Alice|2    |

 
{panel}
Step2: Get an ordered set by using {_}{color:#172b4d}*{{select name, empid from 
emps order by empid}}*{color}{_};
{panel}
0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps 
order by empid;
|NAME|EMPID|
|Wilma|1    |
|John|2    |
|Alice|2    |
|Eric|3    |
|Fred|30  |
{panel}
 
Step3: Using the statement in step2 as a sub query like this:
{panel}
0: jdbc:calcite:model=src/test/resources/mode> select * from (select name, 
empid from emps order by empid);
|NAME|EMPID|
|Fred|30  |
|Eric|3    |
|John|2    |
|Wilma|1    |
|Alice|2    |
{panel}
 
*THE QUESTION IS:*

*The result data is disordered while it's supposed to be the same as it is in 
step2.*

  was:
*It's based on CSV-TEST-MODEL. The same as {{{}lex=MYSQL{}}}.*
{quote}!connect jdbc:calcite:model=src/test/resources/mode
{quote}

Step1: You need to get a table named 'emps' with data as below;
0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps;
|NAME|EMPID|
|Fred|30  |
|Eric|3    |
|John|2    |
|Wilma|1    |
|Alice|2    |


Step2: Get an ordered set by using {_}{color:#172b4d}*{{select name, empid from 
emps order by empid}}*{color}{_};
0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps 
order by empid;
|NAME|EMPID|
|Wilma|1    |
|John|2    |
|Alice|2    |
|Eric|3    |
|Fred|30  |


Step3: Using the statement in step2 as a sub query like this:
0: jdbc:calcite:model=src/test/resources/mode> select * from (select name, 
empid from emps order by empid);
|NAME|EMPID|
|Fred|30  |
|Eric|3    |
|John|2    |
|Wilma|1    |
|Alice|2    |


*THE QUESTION IS:*

*The result data is disordered while it's supposed to be the same as it is in 
step2.*


> The result data is disordered when u tried to get data from an ordered set
> --
>
> Key: CALCITE-5562
> URL: https://issues.apache.org/jira/browse/CALCITE-5562
> Project: Calcite
>  Issue Type: Bug
>Reporter: Yuxin Wu
>Priority: Major
>
> *It's based on CSV-TEST-MODEL. The same as {{{}lex=MYSQL{}}}.*
> {quote}!connect jdbc:calcite:model=src/test/resources/mode
> {quote}
> 
> Step1: You need to get a table named 'emps' with data as below;
> {panel}
> 0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps;
> |NAME|EMPID|
> |Fred|30  |
> |Eric|3    |
> |John|2    |
> |Wilma|1    |
> |Alice|2    |
>  
> {panel}
> Step2: Get an ordered set by using {_}{color:#172b4d}*{{select name, empid 
> from emps order by empid}}*{color}{_};
> {panel}
> 0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps 
> order by empid;
> |NAME|EMPID|
> |Wilma|1    |
> |John|2    |
> |Alice|2    |
> |Eric|3    |
> |Fred|30  |
> {panel}
>  
> Step3: Using the statement in step2 as a sub query like this:
> {panel}
> 0: jdbc:calcite:model=src/test/resources/mode> select * from (select name, 
> empid from emps order by empid);
> |NAME|EMPID|
> |Fred|30  |
> |Eric|3    |
> |John|2    |
> |Wilma|1    |
> |Alice|2    |
> {panel}
>  
> *THE QUESTION IS:*
> *The result data is disordered while it's supposed to be the same as it is in 
> step2.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-5562) The result data is disordered when u tried to get data from an ordered set

2023-03-07 Thread Yuxin Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxin Wu updated CALCITE-5562:
--
Description: 
*It's based on CSV-TEST-MODEL. The same as {{{}lex=MYSQL{}}}.*
{quote}!connect jdbc:calcite:model=src/test/resources/mode
{quote}

Step1: You need to get a table named 'emps' with data as below;
0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps;
|NAME|EMPID|
|Fred|30  |
|Eric|3    |
|John|2    |
|Wilma|1    |
|Alice|2    |


Step2: Get an ordered set by using {_}{color:#172b4d}*{{select name, empid from 
emps order by empid}}*{color}{_};
0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps 
order by empid;
|NAME|EMPID|
|Wilma|1    |
|John|2    |
|Alice|2    |
|Eric|3    |
|Fred|30  |


Step3: Using the statement in step2 as a sub query like this:
0: jdbc:calcite:model=src/test/resources/mode> select * from (select name, 
empid from emps order by empid);
|NAME|EMPID|
|Fred|30  |
|Eric|3    |
|John|2    |
|Wilma|1    |
|Alice|2    |


*THE QUESTION IS:*

*The result data is disordered while it's supposed to be the same as it is in 
step2.*

  was:
*It's based on CSV-TEST-MODEL. The same as {{lex=MYSQL}}.*
{quote}!connect jdbc:calcite:model=src/test/resources/mode
{quote}

Step1: You need to get a table named 'emps' with data as below;
0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps;
+---+---+
| NAME  | EMPID |
+---+---+
| Fred  | 30    |
| Eric  | 3     |
| John  | 2     |
| Wilma | 1     |
| Alice | 2     |
+---+---+
Step2: Get an ordered set by using {_}{color:#172b4d}*{{select name, empid from 
emps order by empid}}*{color}{_};
0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps 
order by empid;
+---+---+
| NAME  | EMPID |
+---+---+
| Wilma | 1     |
| John  | 2     |
| Alice | 2     |
| Eric  | 3     |
| Fred  | 30    |
+---+---+
Step3: Using the statement in step2 as a sub query like this:
0: jdbc:calcite:model=src/test/resources/mode> select * from (select name, 
empid from emps order by empid);
+---+---+
| NAME  | EMPID |
+---+---+
| Fred  | 30    |
| Eric  | 3     |
| John  | 2     |
| Wilma | 1     |
| Alice | 2     |
+---+---+
*THE QUESTION IS:* 

*The result data is disordered while it's supposed to be the same as it is in 
step2.*


> The result data is disordered when u tried to get data from an ordered set
> --
>
> Key: CALCITE-5562
> URL: https://issues.apache.org/jira/browse/CALCITE-5562
> Project: Calcite
>  Issue Type: Bug
>Reporter: Yuxin Wu
>Priority: Major
>
> *It's based on CSV-TEST-MODEL. The same as {{{}lex=MYSQL{}}}.*
> {quote}!connect jdbc:calcite:model=src/test/resources/mode
> {quote}
> 
> Step1: You need to get a table named 'emps' with data as below;
> 0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps;
> |NAME|EMPID|
> |Fred|30  |
> |Eric|3    |
> |John|2    |
> |Wilma|1    |
> |Alice|2    |
> Step2: Get an ordered set by using {_}{color:#172b4d}*{{select name, empid 
> from emps order by empid}}*{color}{_};
> 0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps 
> order by empid;
> |NAME|EMPID|
> |Wilma|1    |
> |John|2    |
> |Alice|2    |
> |Eric|3    |
> |Fred|30  |
> Step3: Using the statement in step2 as a sub query like this:
> 0: jdbc:calcite:model=src/test/resources/mode> select * from (select name, 
> empid from emps order by empid);
> |NAME|EMPID|
> |Fred|30  |
> |Eric|3    |
> |John|2    |
> |Wilma|1    |
> |Alice|2    |
> *THE QUESTION IS:*
> *The result data is disordered while it's supposed to be the same as it is in 
> step2.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (CALCITE-5562) The result data is disordered when u tried to get data from an ordered set

2023-03-07 Thread Yuxin Wu (Jira)

Yuxin Wu created CALCITE-5562:
-

 Summary: The result data is disordered when u tried to get data 
from an ordered set
 Key: CALCITE-5562
 URL: https://issues.apache.org/jira/browse/CALCITE-5562
 Project: Calcite
  Issue Type: Bug
Reporter: Yuxin Wu


*It's based on CSV-TEST-MODEL. The same as {{lex=MYSQL}}.*
{quote}!connect jdbc:calcite:model=src/test/resources/mode
{quote}

Step1: You need to get a table named 'emps' with data as below;
0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps;
+---+---+
| NAME  | EMPID |
+---+---+
| Fred  | 30    |
| Eric  | 3     |
| John  | 2     |
| Wilma | 1     |
| Alice | 2     |
+---+---+
Step2: Get an ordered set by using {_}{color:#172b4d}*{{select name, empid from 
emps order by empid}}*{color}{_};
0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps 
order by empid;
+---+---+
| NAME  | EMPID |
+---+---+
| Wilma | 1     |
| John  | 2     |
| Alice | 2     |
| Eric  | 3     |
| Fred  | 30    |
+---+---+
Step3: Using the statement in step2 as a sub query like this:
0: jdbc:calcite:model=src/test/resources/mode> select * from (select name, 
empid from emps order by empid);
+---+---+
| NAME  | EMPID |
+---+---+
| Fred  | 30    |
| Eric  | 3     |
| John  | 2     |
| Wilma | 1     |
| Alice | 2     |
+---+---+
*THE QUESTION IS:* 

*The result data is disordered while it's supposed to be the same as it is in 
step2.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5559) Improve RepeatUnion by discarding duplicates at TableSpool level

2023-03-07 Thread Ruben Q L (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697311#comment-17697311
 ] 

Ruben Q L commented on CALCITE-5559:


I have tried both approaches:

A) Deal with duplicates inside TableSpool. This would require either enlarging 
the existing Logical & Enumerable TableSpool operators to support a flag à la 
Union (e.g. all=true/false, default true) to discard duplicates; or creating a 
new operator (SetTableSpool?) to support this duplicate removal. Either way, 
when creating a RepeatUnion with all=false, the "optimized" TableSpool should 
be used to discard duplicates earlier. This would work, but seems a rather 
ad-hoc solution, it does not seem like a "natural responsibility" of a 
TableSpool to deal with duplicates.

B) Use aggregations (distinct) before each TableSpool, i.e. in 
{{RelBuilder#repeatUnion}} do something like:
{code:java}
  public RelBuilder repeatUnion(String tableName, boolean all, int 
iterationLimit) {
...
RelNode iterative = tableSpool(Spool.Type.LAZY, Spool.Type.LAZY, 
finder.relOptTable, all).build();
RelNode seed = tableSpool(Spool.Type.LAZY, Spool.Type.LAZY, 
finder.relOptTable, all).build();
RelNode repeatUnion = struct.repeatUnionFactory.createRepeatUnion(seed, 
iterative, all, iterationLimit, finder.relOptTable);
==>
  public RelBuilder repeatUnion(String tableName, boolean all, int 
iterationLimit) {
...
if (!all)
  this.distinct();
RelNode iterative = tableSpool(Spool.Type.LAZY, Spool.Type.LAZY, 
finder.relOptTable, all).build();
if (!all)
  this.distinct();
RelNode seed = tableSpool(Spool.Type.LAZY, Spool.Type.LAZY, 
finder.relOptTable, all).build();
RelNode repeatUnion = struct.repeatUnionFactory.createRepeatUnion(seed, 
iterative, all, iterationLimit, finder.relOptTable);
{code}

This seems a more logical approach, where we reuse existing operators, we use 
the "proper operator" (aggregate) to deal with duplicates and we get the 
benefits of it too (e.g. if the subplan before the {{distinct}} contains a 
projection of the PK field of a table, it can be removed by the existing 
Calcite logic since it is unnecessary).
However, the big issue with this approach B is that currently Aggregate is not 
"compatible" with RepeatUnion's iterative mechanism (where each iteration 
requires the "re-evaluation" of the input enumerator), since this enumerator 
gets eagerly computed on creation, so it will always have the same content, 
unaware of the fact that the underlying source (transient scan) has changed. 
Basically we have the same issue as we used to have with Sort operator's 
implementation in {{EnumerableDefaults#orderBy}} which was changed via 
(CALCITE-3820), see more info about this problem in that ticket's description.
So, if we decided to go with approach B, a pre-requisite would be changing all 
distinct / gruopBy implementations in EnumerableDefaults from eagerly 
evaluation to lazily evaluated (and re-evaluated if required), e.g.:
{code}
  public static  Enumerable> groupBy(
  final Enumerable enumerable,
  final Function1 keySelector) {
return enumerable.toLookup(keySelector);
  }
==>
  public static  Enumerable> groupBy(
  final Enumerable enumerable,
  final Function1 keySelector) {
return new AbstractEnumerable>() {
  @Override public Enumerator> enumerator() {
return enumerable.toLookup(keySelector).enumerator();
  }
};
  }
{code}

The advantages of this change (as described in CALCITE-3820) would be avoiding 
computation in some cases where it is not required; and make the enumerator 
"re-evaluable" and hence compatible with RepeatUnion. Also, quoting Julian Hyde 
"When you create an Enumerable, the work (in particular calling an input's 
enumerator() method) should not happen until Enumerable.enumerator() is called".
The main disadvantage is that the current code computes the enumerator 
(eagerly) just once and then re-uses it, so in the (rare?) scenario of 
accessing the Aggregate enumerator multiple times, the new code would need to 
re-compute it on each time. However, since this change was approved for Sort 
(CALCITE-3820), I guess the same could be applied to Aggregate.

So, if we decide to go with B, I guess we would need a separate ticket (that 
would block the current one) to move EnumerableDefaults' distinct and groupBy 
methods from eager to lazy enumerator computation (and maybe also other 
operator's methods in the same situation such as union, except, intersect, 
others?).

> Improve RepeatUnion by discarding duplicates at TableSpool level
> 
>
> Key: CALCITE-5559
> URL: https://issues.apache.org/jira/browse/CALCITE-5559
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Ruben Q L
>

[jira] [Commented] (CALCITE-5403) PostgreSQL dialect should support SET, RESET, BEGIN, SHOW, ROLLBACK, COMMIT

2023-03-07 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697295#comment-17697295
 ] 

Julian Hyde commented on CALCITE-5403:
--

The word “dialect” in generally refers to the SQL that Calcite generates, not 
what it parses. Can you adjust the summary?

> PostgreSQL dialect should support SET, RESET, BEGIN, SHOW, ROLLBACK, COMMIT
> ---
>
> Key: CALCITE-5403
> URL: https://issues.apache.org/jira/browse/CALCITE-5403
> Project: Calcite
>  Issue Type: Bug
>  Components: babel
>Affects Versions: 1.34.0
>Reporter: Dmitry Sysolyatin
>Assignee: Dmitry Sysolyatin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Babel parser can not parse BEGIN [1], SHOW [2], ROLLBACK [3], COMMIT [4] 
> commands at the moment.
> It can parse SET [5] and RESET [6], but not completely. For instance, it can 
> not parse the following statement:
> {code:java}
> SET SCHEMA public,public,"$user"
> {code}
> The syntax for these commands is as follows:
>  * BEGIN [1]:
> {code:java}
> BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] 
> where transaction_mode is one of: ISOLATION LEVEL { SERIALIZABLE | REPEATABLE 
> READ | READ COMMITTED | READ UNCOMMITTED } READ WRITE | READ ONLY [ NOT ] 
> DEFERRABLE
> {code}
>  * SHOW [2]:
> {code:java}
> SHOW ( | )
> {code}
>  * ROLLBACK [3]:
> {code:java}
> ROLLBACK [ WORK | TRANSACTION ] [ AND [ NO ] CHAIN ]
> {code}
>  * COMMIT [4]:
> {code:java}
> COMMIT [ WORK | TRANSACTION ] [ AND [ NO ] CHAIN ]
> {code}
>  * SET [5]
> {code:java}
> SET [ SESSION | LOCAL ] configuration_parameter { TO | = } { value | 'value' 
> | DEFAULT }
> SET [ SESSION | LOCAL ] TIME ZONE { value | 'value' | LOCAL | DEFAULT }
> SET [ SESSION | LOCAL ]  { SCHEMA | SEED | NAMES } value
> Values can be specified as string constants, identifiers, numbers, or 
> comma-separated lists of these
> {code}
>  * RESET [6]
> {code:java}
> RESET ;
> {code}
> [1] [https://www.postgresql.org/docs/current/sql-begin.html]
> [2] [https://www.postgresql.org/docs/current/sql-show.html]
> [3] [https://www.postgresql.org/docs/current/sql-rollback.html]
> [4] [https://www.postgresql.org/docs/current/sql-commit.html]
> [5] [https://www.postgresql.org/docs/current/sql-set.html]
> [6] [https://www.postgresql.org/docs/current/sql-reset.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-5559) Improve RepeatUnion by discarding duplicates at TableSpool level

2023-03-07 Thread Ruben Q L (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruben Q L updated CALCITE-5559:
---
Description: 
Currently, RepeatUnion operator with all=false keeps track of the elements that 
it has returned in order to discard duplicates. However, the TableSpool 
operators that are right below it do not have such control. In certain 
scenarios, duplicates are returned by the TableSpool current iteration, 
discarded by the RepeatUnion, but have been already "fed back" by the 
TableSpool into the next iteration, causing unnecessary processing.
We can optimize this scenario by keeping track of the duplicates inside/before 
the TableSpool too (note: we still need to keep track of duplicates at 
RepeatUnion level, because that is the only place where we can detect a 
potential "global duplicate" of an element: returned by the LHS and then also 
by the RHS, or by two different iterations of the RHS).

A PoC testing this improvement on a downstream project showed that certain 
queries can go from ~40s down to ~1s.

  was:
Currently, RepeatUnion operator with all=false keeps track of the elements that 
it has returned in order to discard duplicates. However, the TableSpool 
operators that are right below it do not have such control. In certain 
scenarios, duplicates are returned by the TableSpool current iteration, 
discarded by the RepeatUnion, but have been already "fed back" by the 
TableSpool into the next iteration, causing unnecessary processing.
We can optimize this scenario by keeping track of the duplicates inside the 
TableSpool too (note: we still need to keep track of duplicates at RepeatUnion 
level, because that is the only place where we can detect a potential "global 
duplicate" of an element: returned by the LHS and then also by the RHS, or by 
two different iterations of the RHS).

A PoC testing this improvement on a downstream project showed that certain 
queries can go from ~40s down to ~1s.


> Improve RepeatUnion by discarding duplicates at TableSpool level
> 
>
> Key: CALCITE-5559
> URL: https://issues.apache.org/jira/browse/CALCITE-5559
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Ruben Q L
>Assignee: Ruben Q L
>Priority: Major
>
> Currently, RepeatUnion operator with all=false keeps track of the elements 
> that it has returned in order to discard duplicates. However, the TableSpool 
> operators that are right below it do not have such control. In certain 
> scenarios, duplicates are returned by the TableSpool current iteration, 
> discarded by the RepeatUnion, but have been already "fed back" by the 
> TableSpool into the next iteration, causing unnecessary processing.
> We can optimize this scenario by keeping track of the duplicates 
> inside/before the TableSpool too (note: we still need to keep track of 
> duplicates at RepeatUnion level, because that is the only place where we can 
> detect a potential "global duplicate" of an element: returned by the LHS and 
> then also by the RHS, or by two different iterations of the RHS).
> A PoC testing this improvement on a downstream project showed that certain 
> queries can go from ~40s down to ~1s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

37 matches

Mail list logo