Re: How to get the used columns (categorized by tables) from a SQL

2020-04-16 Thread XING JIN
You can use SqlValidator.getFieldOrigins for a SqlNode, or use
RelMetadataQuery.getColumnOrigins for a RelNode.

- Jin


Julian Hyde  于2020年4月17日周五 上午7:29写道:

> At the RelNode level, you can use RelMetadataQuery.getColumnOrigins.
>
> But I’m sure that there’s something available at the SqlNode level, after
> validation. PreparedExplain.getFieldOrigins() is what the JDBC driver uses
> to provide ResultSetMetadata.getColumnName(int) etc.
>
> Julian
>
>
>
> > On Apr 16, 2020, at 4:17 PM, Feng Zhu  wrote:
> >
> > You can customize a RelVisitor to visit the RelNode, and collect the
> > columns from TableScan's rowtype.
> >
> > 王驰  于2020年4月16日周四 下午11:23写道:
> >
> >> Hi folks,
> >>
> >>
> >> We're using CalCite in our project for SQL parsing and validating. We
> have
> >> a question: is there any way to get all used columns from a SQL?
> >>
> >>
> >> Consider the following case:
> >>
> >> we have two tables `user` and `user_region`. And the SQL is like
> >>
> >>
> >> ```
> >>
> >> SELECT
> >>
> >>id, name, age, country, province, city
> >>
> >> FROM
> >>
> >>user
> >>
> >>INNER JOIN user_region ON user.id = user_region.user_id
> >>
> >> WHERE age > 18;
> >>
> >> ```
> >>
> >>
> >> The result will be a Map with two keys:
> >>
> >>
> >>   - 'user' --> List('id', 'name', 'age')
> >>   - 'user_region' --> List('country', 'province', 'city', 'user_id')
> >>
> >>
> >> ==
> >>
> >> I've tried with SqlValidator (along with SqlValidatorScope,
> >> SqlValidatorNamespace) but found no easy ways to do this.
> >> I also tried to replay the validate process in our code but it seems
> >> impossible since most classes used are package private.
> >>
> >>
> >> Could you please give us some suggestions? Thanks!
> >>
> >>
>
>


Generating SQL with non-ASCII characters in string literals

2020-04-16 Thread Julian Hyde
Someone just logged https://issues.apache.org/jira/browse/CALCITE-3933 
 "Incorrect SQL Emitted for 
Unicode for Several Dialects”.

Consider the literal ‘schön’ (German for ‘beautiful’). It seems that we 
generate SQL standard syntax, u&’sch\00f6n’, which works on PostgreSQL, but 
several databases (including MySQL, Redshift, BigQuery) do not support it. What 
should we be generating? Is there an easy solution that will work across many 
databases.

I was surprised to see that there are no tests for generating unicode literals. 
That’s part of the problem, I think.

Julian



[jira] [Created] (CALCITE-3933) Incorrect SQL Emitted for Unicode for Several Dialects

2020-04-16 Thread Aryeh Hillman (Jira)
Aryeh Hillman created CALCITE-3933:
--

 Summary: Incorrect SQL Emitted for Unicode for Several Dialects
 Key: CALCITE-3933
 URL: https://issues.apache.org/jira/browse/CALCITE-3933
 Project: Calcite
  Issue Type: Bug
 Environment: master with latest commit on April 15 (

dfb842e55e1fa7037c8a731341010ed1c0cfb6f7)
Reporter: Aryeh Hillman


A string literal like "schön" should emit "schön" in SQL for most dialects 
(BigQuery, MySQL, Redshift and a few others), but instead emits 
"u&'sch\\00f6n'" (ISO-8859-1 ASCII).  It's possible that some dialects may 
support ISO-8859, but in my tests with BigQuery Standard SQL, MySQL, and 
Redshift engines, "select u&'sch\\00f6n';` fails but "select 'schön';` succeeds.

Test that demonstrates (add to 
`org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java` and run from there):
{code:java}
@Test void testBigQueryUnicode() {
  final Function relFn = b ->
  b.scan("EMP")
  .filter(
  b.call(SqlStdOperatorTable.IN, b.field("ENAME"),
  b.literal("schön")))
  .build();
  final String expectedSql = "SELECT *\n"
  + "FROM \"scott\".\"EMP\"\n"
  + "WHERE \"ENAME\" IN ('schön')";
  relFn(relFn).withBigQuery().ok(expectedSql);
}
{code}
  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: How to get the used columns (categorized by tables) from a SQL

2020-04-16 Thread Julian Hyde
At the RelNode level, you can use RelMetadataQuery.getColumnOrigins.

But I’m sure that there’s something available at the SqlNode level, after 
validation. PreparedExplain.getFieldOrigins() is what the JDBC driver uses to 
provide ResultSetMetadata.getColumnName(int) etc.

Julian



> On Apr 16, 2020, at 4:17 PM, Feng Zhu  wrote:
> 
> You can customize a RelVisitor to visit the RelNode, and collect the
> columns from TableScan's rowtype.
> 
> 王驰  于2020年4月16日周四 下午11:23写道:
> 
>> Hi folks,
>> 
>> 
>> We're using CalCite in our project for SQL parsing and validating. We have
>> a question: is there any way to get all used columns from a SQL?
>> 
>> 
>> Consider the following case:
>> 
>> we have two tables `user` and `user_region`. And the SQL is like
>> 
>> 
>> ```
>> 
>> SELECT
>> 
>>id, name, age, country, province, city
>> 
>> FROM
>> 
>>user
>> 
>>INNER JOIN user_region ON user.id = user_region.user_id
>> 
>> WHERE age > 18;
>> 
>> ```
>> 
>> 
>> The result will be a Map with two keys:
>> 
>> 
>>   - 'user' --> List('id', 'name', 'age')
>>   - 'user_region' --> List('country', 'province', 'city', 'user_id')
>> 
>> 
>> ==
>> 
>> I've tried with SqlValidator (along with SqlValidatorScope,
>> SqlValidatorNamespace) but found no easy ways to do this.
>> I also tried to replay the validate process in our code but it seems
>> impossible since most classes used are package private.
>> 
>> 
>> Could you please give us some suggestions? Thanks!
>> 
>> 



Re: How to get the used columns (categorized by tables) from a SQL

2020-04-16 Thread Feng Zhu
You can customize a RelVisitor to visit the RelNode, and collect the
columns from TableScan's rowtype.

王驰  于2020年4月16日周四 下午11:23写道:

> Hi folks,
>
>
> We're using CalCite in our project for SQL parsing and validating. We have
> a question: is there any way to get all used columns from a SQL?
>
>
> Consider the following case:
>
> we have two tables `user` and `user_region`. And the SQL is like
>
>
> ```
>
> SELECT
>
> id, name, age, country, province, city
>
> FROM
>
> user
>
> INNER JOIN user_region ON user.id = user_region.user_id
>
> WHERE age > 18;
>
> ```
>
>
> The result will be a Map with two keys:
>
>
>- 'user' --> List('id', 'name', 'age')
>- 'user_region' --> List('country', 'province', 'city', 'user_id')
>
>
> ==
>
> I've tried with SqlValidator (along with SqlValidatorScope,
> SqlValidatorNamespace) but found no easy ways to do this.
> I also tried to replay the validate process in our code but it seems
> impossible since most classes used are package private.
>
>
> Could you please give us some suggestions? Thanks!
>
>


Re: Re: [DISCUSS] Refactor how planner rules are parameterized

2020-04-16 Thread Julian Hyde
I have now pushed a dev branch with a prototype. Please see
https://issues.apache.org/jira/browse/CALCITE-3923 for details.

Having built the prototype, I believe that this change is beneficial
and we should do it. But I would like to get to consensus on the
design before we pull the trigger.

Julian

On Tue, Apr 14, 2020 at 2:06 PM Julian Hyde  wrote:
>
> Haisheng,
>
> I hear you. I agree that major changes to rules will require new rule
> classes (not merely sub-classes). People should copy-paste, refactor,
> and all that good stuff. But I think there are a lot of cases where we
> need to make minor changes to rules (there are many of these in the
> code base already), and this change will help.
>
> I have logged https://issues.apache.org/jira/browse/CALCITE-3923 and
> am going to start working on a prototype. When we have a prototype we
> will be able to assess how big an impact the API change will have.
> (E.g. whether it will be a breaking change.)
>
> Julian
>
> On Sat, Mar 14, 2020 at 8:22 PM Haisheng Yuan  wrote:
> >
> > I don't think it is worth the refactoring. People who want to customize the 
> > rule, in most cases, won't be satisfied by a different parameter, they most 
> > likely still need to rewrite (copy & paste) the rule with some slightly 
> > their own logic. For many Calcite users, the rule is not reusable even with 
> > flexible configurations.
> >
> > - Haisheng
> >
> > --
> > 发件人:Stamatis Zampetakis
> > 日 期:2020年03月14日 22:54:04
> > 收件人:
> > 主 题:Re: [DISCUSS] Refactor how planner rules are parameterized
> >
> > Hello,
> >
> > Apologies for the late reply but I just realised that I had written the
> > mail and never pressed the send button.
> >
> > I think it is a nice idea and certainly a problem worth addressing. If I
> > understood well you're thinking something like the current constructor of
> > the RelBuilder [1] that accepts a Context parameter. Indeed it seems that
> > with this change even rules that are not designed to be configured can be
> > changed much more gracefully (without adding new constructors and breaking
> > changes).
> >
> > On the other hand, some of the advantages that you mention can also be
> > turned into disadvantages. For instance, copying a rule without knowing the
> > values of the other parameters is a bit risky and might be harder to reason
> > about its correctness. Moreover, private constructors, final classes, etc.,
> > are primarily used for encapsulation purposes so allowing the state of the
> > rule escape somehow breaks the original design of the rule.
> >
> > Another problem with respect to rules is cross convention matching and
> > transformations [2]. Many rules should not fire for operands that are in
> > different conventions; a typical example that comes in my mind is
> > FilterProjectTransposeRule [3]. In the same spirit most rules should not
> > generate mixed convention transformations. Although a different problem, I
> > am mentioning it here since it could affect the design of the new API.
> >
> > Best,
> > Stamatis
> >
> > [1]
> > https://github.com/apache/calcite/blob/f5a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155
> >
> > [2]
> > https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
> > [3]
> > https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57
> >
> > On Thu, Feb 20, 2020 at 9:20 PM Michael Mior  wrote:
> >
> > > This sounds reasonable to me. It also sounds like we could make this
> > > backwards compatible by retaining (but deprecating) the existing
> > > constructors and factory methods that will no longer be needed.
> > > --
> > > Michael Mior
> > > mm...@apache.org
> > >
> > > Le jeu. 20 févr. 2020 à 13:11, Julian Hyde  a écrit :
> > > >
> > > > I have an idea for a refactoring to RelOptRule. I haven’t fully thought
> > > it through, but I’m going to sketch it out here to see whether folks agree
> > > about the problems/solutions.
> > > >
> > > > It will be a breaking change (in the sense that people will have to
> > > change their code in order to get it to compile) but relatively safe (in
> > > that once the code compiles, it will have the same behavior as before).
> > > Also it will give Calcite developers and users a lot more flexibility 
> > > going
> > > forward.
> > > >
> > > > The problem I see is that people often want different variants of
> > > planner rules. An example is FilterJoinRule, which has a 'boolean smart’
> > > parameter, a predicate (which returns whether to pull up filter
> > > conditions), operands (which determine the precise sub-classes of RelNode
> > > that the rule should match) and a relBuilderFactory (which controls the
> > > type of RelNode created by this rule).
> > > >
> > > > Suppose you 

[jira] [Created] (CALCITE-3932) Make data type cache thread local, non-evictable

2020-04-16 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3932:
--

 Summary: Make data type cache thread local, non-evictable
 Key: CALCITE-3932
 URL: https://issues.apache.org/jira/browse/CALCITE-3932
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


Data type can be evicted out of cache, which is global, thread-safe. `=` based 
type comparison may suffer from cache eviction. 

It seems not necessary to cache them globally, because most of them are 
RelRecordType, which is query dependent, not sharable between different queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


How to get the used columns (categorized by tables) from a SQL

2020-04-16 Thread 王驰
Hi folks,


We're using CalCite in our project for SQL parsing and validating. We have a 
question: is there any way to get all used columns from a SQL?


Consider the following case:

we have two tables `user` and `user_region`. And the SQL is like


```

SELECT

id, name, age, country, province, city

FROM

user

INNER JOIN user_region ON user.id = user_region.user_id

WHERE age > 18;

```


The result will be a Map with two keys:

  *   'user' --> List('id', 'name', 'age')
  *   'user_region' --> List('country', 'province', 'city', 'user_id')

==

I've tried with SqlValidator (along with SqlValidatorScope, 
SqlValidatorNamespace) but found no easy ways to do this.
I also tried to replay the validate process in our code but it seems impossible 
since most classes used are package private.
[cid:e62a7598-728a-44d9-8e5c-365bdcde7c45]

Could you please give us some suggestions? Thanks!



[jira] [Created] (CALCITE-3931) Add LOOKAHEAD(2) for methods defined in createStatementParserMethods

2020-04-16 Thread Danny Chen (Jira)
Danny Chen created CALCITE-3931:
---

 Summary: Add LOOKAHEAD(2) for methods defined in 
createStatementParserMethods
 Key: CALCITE-3931
 URL: https://issues.apache.org/jira/browse/CALCITE-3931
 Project: Calcite
  Issue Type: Improvement
  Components: core
Affects Versions: 1.22.0
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 1.23.0


The default LOOKAHEAD is 1 which is very probably to conflict, especially for 
custom parse block like SqlCreate.

Sets the LOOKAHEAD(2) to reduce conflict.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)