[Question] Long Runtime for SubqueryRemoveRule, vs Expanding Subqueries

2024-06-11 Thread JinxTheKid
Hi Calcite community,

I'm running into some challenges using SubqueryRemoveRule and was hoping
the community could help me understand any gaps in my understanding.

I am attempting to remove subqueries from row expressions in a relational
plan. My motivation for doing this is that I want to call metadata queries
such GetColumnOrigins on my relational plan, but these tools don't seem to
work unless subqueries are removed. To remove subqueries, I am...
- Using SqlToRelConverter to convert from SqlNode to RelRoot.
- After obtaining the RelRoot, using a HepProgram with subquery remove
rules to remove the subqueries.

For small programs, this process works fine, however when testing against
larger queries, ~2MB, the process of removing subqueries takes about as
much time as converting a SqlNode to a RelRoot, effectively doubling the
time to analyze a query. I've tried using a deprecated setting,
SqlToRelConverter.Config.isExpand, which seems to provide results similar
to what I want with minimal performance impact.

Since SqlToRelConverter.Config.isExpand is deprecated, I assume that the
non-deprecated way to expand subqueries would be as performant or better,
but I am experiencing a worse performance. This leads me to believe that
I'm using the APIs incorrectly, but I'm not sure what I am missing.

Q: Is there a more efficient way in Calcite to remove subqueries than what
I've described above? Is there risk to using
SqlToRelConverter.Config.isExpand even if it's deprecated?

Q: Are there any other resources besides Java docs and GitHub projects I
could refer to to learn more about my problem?

Thanks for any help,
Logan


[jira] [Created] (CALCITE-6434) Specify identifier quoting for HiveSqlDialect and SparkSqlDialect

2024-06-11 Thread xiong duan (Jira)
xiong duan created CALCITE-6434:
---

 Summary: Specify identifier quoting for HiveSqlDialect and 
SparkSqlDialect
 Key: CALCITE-6434
 URL: https://issues.apache.org/jira/browse/CALCITE-6434
 Project: Calcite
  Issue Type: Improvement
  Components: core
Affects Versions: 1.37.0
Reporter: xiong duan
Assignee: xiong duan
 Fix For: 1.38.0


The SQL:
{code:java}
SELECT product.product_class_id C
FROM foodmart.product
LEFT JOIN (SELECT CASE COUNT(*) WHEN 0 THEN NULL WHEN 1 THEN 
MIN(product_class_id) ELSE (SELECT NULL
UNION ALL
SELECT NULL) END $f0
FROM foodmart.product) t0 ON TRUE
WHERE product.net_weight > t0.$f0{code}
Generate by SINGLE_VALUE agg function.

This SQL will parse failed in Spark Unless we add the identifier quoting like 
`t0`.`$f0`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6433) SUBSTRING can return incorrect empty result for some parameters

2024-06-11 Thread Iurii Gerzhedovich (Jira)
Iurii Gerzhedovich created CALCITE-6433:
---

 Summary: SUBSTRING can return incorrect empty result for some 
parameters
 Key: CALCITE-6433
 URL: https://issues.apache.org/jira/browse/CALCITE-6433
 Project: Calcite
  Issue Type: Improvement
  Components: core
Affects Versions: 1.37.0
Reporter: Iurii Gerzhedovich


SUBSTRING function for cases when 3rd parameter (length) more than
Integer.MAX_VALUE can return empty result due to code do clamp
 that value and after that it can't be more than Integer.MAX_VALUE.
Simple way to reproduce :
append into *SqlOperatorTest* smth like:
{noformat}
f.checkScalar(
String.format("{fn SUBSTRING('abcdef', %d, %d)}", Integer.MIN_VALUE, 
10L + Integer.MAX_VALUE),
"abcdef",
"VARCHAR(6) NOT NULL");

{noformat}
it`s all due to check after clamping
{noformat}
public static String substring(String c, int s, int l) {
  
  long e = (long) s + (long) l; -- here we can got incorrect length 
  .
  
  if (s > lc || e < 1L) {
return "";
  }
-{noformat}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)