SqlPrettyWriter writing SqlNodeList elements enclosed with parentheses?

2022-04-12 Thread Jenny Kwan
Hi,

I'm using `SqlPrettyWriter`:

```
SqlWriter sw = SqlPrettyWriter(
PostgresqlSqlDialect.DEFAULT,
SqlPrettyWriter.config(),
StringBuilder()
)

sw.reset()

sw.format(SqlNodeList(...))
```

The string that is returned has each `SqlNode` in the `SqlNodeList`
formatted as text properly, but surrounded by parentheses and separated by
commas.

I haven't dug into the default `SqlWriterConfig` settings, but this just
seemed like a weird default?

Is there a better way to get the appropriate `SqlWriterConfig` for a given
dialect? Should I be using a different constructor?

Thanks,
Jenny


Re: [DISCUSS] Should DEFAULT be a keyword

2022-04-12 Thread Yanjing Wang
Thanks Gavin and Julian, If The DEFAULT can't be removed from core parser,
I think the quote is a must. Sorry, I don't know why the hive
choose DEFAULT as its default current database.

Julian Hyde  于2022年4月13日周三 02:45写道:

> On reflection, I still think that DEFAULT should be a reserved keyword.
> But I don’t think that wouldn’t prevent us from supporting it as a prefix
> to a table name in the core parser.
>
> Are the deliberations of Hive or Trino (Presto/PrestoDB) public? I’d like
> to know their rationale.
>
> Julian
>
>
> > On Apr 12, 2022, at 11:40 AM, Julian Hyde 
> wrote:
> >
> > DEFAULT is from the SQL standard. You can use it when invoking UDFs with
> named parameters, e.g.
> >
> >  SELECT myFun(p1 => 1, p2 => DEFAULT, p3 => 3)
> >  FROM t
> >
> > and in an INSERT statement:
> >
> >  INSERT INTO t (x, y, z) VALUES (1, DEFAULT, 2)
> >
> > as long as parameter p2 and column y have default values.
> >
> > So yes, DEFAULT should be a reserved keyword. And Hive screwed up by not
> reading the standard.
> >
> > If there is a database or schema called DEFAULT then you can quote it,
> e.g.
> >
> >  SELECT *
> >  FROM “DEFAULT”.t
> >
> > We could consider making DEFAULT non-reserved in the Babel parser.
> >
> > Julian
> >
> >
> >
> >
> >> On Apr 12, 2022, at 9:06 AM, Gavin Ray  wrote:
> >>
> >> The hacky way would be to add a grammar clause like this I think:
> >>
> >> | "DEFAULT"
> >>
> >> To the "TableRef2" node in the parser grammar:
> >>
> https://github.com/apache/calcite/blob/82dd78a14f6aef2eeec2f9c94978d04b4acc5359/core/src/main/codegen/templates/Parser.jj#L2087-L2276
> >>
> >> Not sure all the implications this would have elsewhere though.
> >>
> >> Also for personal opinion -- the notion of a "DEFAULT" database I'm not
> >> sure is a universal thing.
> >> I've never worked with analytical DB's, only Postgres/MySQL -- and in
> these
> >> DB's the default database is implicit in your queries.
> >>
> >> IE if I connect to a database called "mydb", all queries are
> automatically
> >> prefixed implicitly with "mydb."
> >>
> >> MySQL I don't think has a default database. A brand-new MySQL DB has 0
> >> databases.
> >> There is one called "mysql" but that's a system DB, like "pg_catalog" or
> >> "information_schema".
> >>
> >> Confusingly this is different in Postgres, where the "postgres" DB is
> one
> >> that exists by default
> >> and IS intended to be a user-facing DB.
> >>
> >>
> >> On Mon, Apr 11, 2022 at 11:36 PM Yanjing Wang <
> zhuangzixiao...@gmail.com>
> >> wrote:
> >>
> >>> Hi community,
> >>>
> >>> In hive, DEFAULT is a default database, so one can write sql
> >>> SELECT * FROM DEFAULT.t
> >>>
> >>> In trino, one can write
> >>> SELECT * FROM HIVE.DEFAULT.t
> >>>
> >>> But now Calcite treat DEFAULT as a keyword which will be resolved to
> >>> DEFAULT() function.
> >>>
> >>> What should I do to make the parser to support the DEFAULT in the table
> >>> identifier?
> >>>
> >>> I tried to add the DEFAULT to non-keywords but it would cause DEFAULT()
> >>> function fails.
> >>>
> >>> Now I must tell the user wrap the DEFAULT using quotes like this
> >>> SELECT * FROM HIVE."DEFAULT".t
> >>>
> >>> Thanks in advance.
> >>>
> >
>
>


Re: [DISCUSS] Rename 'master' branch to 'main' (part 2)

2022-04-12 Thread Michael Mior
+1

Thanks for resurfacing this Julian. Since main is becoming the default
pretty much everywhere and the tooling has improved, I think several of the
criticisms of this are less relevant. Yes, it still does require work and
some things may break. But this is a one time change and only affects
contributors, not any consumers of Calcite artifacts.

--
Michael Mior
mm...@apache.org


Le mar. 12 avr. 2022 à 15:10, Julian Hyde  a écrit :

> In August 2020 Michael started a discussion[1] about renaming the
> 'master' branch in Git to 'main'. A majority of the community seemed
> to be in favor, although some people did not agree with the change.
>
> If I recall correctly, we did not move forward with the rename at that
> time in part because of tooling: GitHub tooling was not mature enough
> to guarantee that the change would be smooth. I now believe the
> tooling is mature. I have migrated several personal projects, and
> GitHub makes it very easy.
>
> I propose that we move forward with renaming the 'master' branch in
> the calcite, calcite-avatica and calcite-avatica-go repositories. I
> propose that we do calcite-avatica first, because it is less used than
> calcite.
>
> Julian
>
> [1] https://lists.apache.org/thread/7f2kpkf8lrq0mmt46lb2orsh39kq7dbs
>


[DISCUSS] Rename 'master' branch to 'main' (part 2)

2022-04-12 Thread Julian Hyde
In August 2020 Michael started a discussion[1] about renaming the
'master' branch in Git to 'main'. A majority of the community seemed
to be in favor, although some people did not agree with the change.

If I recall correctly, we did not move forward with the rename at that
time in part because of tooling: GitHub tooling was not mature enough
to guarantee that the change would be smooth. I now believe the
tooling is mature. I have migrated several personal projects, and
GitHub makes it very easy.

I propose that we move forward with renaming the 'master' branch in
the calcite, calcite-avatica and calcite-avatica-go repositories. I
propose that we do calcite-avatica first, because it is less used than
calcite.

Julian

[1] https://lists.apache.org/thread/7f2kpkf8lrq0mmt46lb2orsh39kq7dbs


Re: [DISCUSS] Should DEFAULT be a keyword

2022-04-12 Thread Julian Hyde
On reflection, I still think that DEFAULT should be a reserved keyword. But I 
don’t think that wouldn’t prevent us from supporting it as a prefix to a table 
name in the core parser.

Are the deliberations of Hive or Trino (Presto/PrestoDB) public? I’d like to 
know their rationale.

Julian
 

> On Apr 12, 2022, at 11:40 AM, Julian Hyde  wrote:
> 
> DEFAULT is from the SQL standard. You can use it when invoking UDFs with 
> named parameters, e.g.
> 
>  SELECT myFun(p1 => 1, p2 => DEFAULT, p3 => 3)
>  FROM t
> 
> and in an INSERT statement:
> 
>  INSERT INTO t (x, y, z) VALUES (1, DEFAULT, 2)
> 
> as long as parameter p2 and column y have default values.
> 
> So yes, DEFAULT should be a reserved keyword. And Hive screwed up by not 
> reading the standard.
> 
> If there is a database or schema called DEFAULT then you can quote it, e.g.
> 
>  SELECT *
>  FROM “DEFAULT”.t
> 
> We could consider making DEFAULT non-reserved in the Babel parser.
> 
> Julian
> 
> 
> 
> 
>> On Apr 12, 2022, at 9:06 AM, Gavin Ray  wrote:
>> 
>> The hacky way would be to add a grammar clause like this I think:
>> 
>> | "DEFAULT"
>> 
>> To the "TableRef2" node in the parser grammar:
>> https://github.com/apache/calcite/blob/82dd78a14f6aef2eeec2f9c94978d04b4acc5359/core/src/main/codegen/templates/Parser.jj#L2087-L2276
>> 
>> Not sure all the implications this would have elsewhere though.
>> 
>> Also for personal opinion -- the notion of a "DEFAULT" database I'm not
>> sure is a universal thing.
>> I've never worked with analytical DB's, only Postgres/MySQL -- and in these
>> DB's the default database is implicit in your queries.
>> 
>> IE if I connect to a database called "mydb", all queries are automatically
>> prefixed implicitly with "mydb."
>> 
>> MySQL I don't think has a default database. A brand-new MySQL DB has 0
>> databases.
>> There is one called "mysql" but that's a system DB, like "pg_catalog" or
>> "information_schema".
>> 
>> Confusingly this is different in Postgres, where the "postgres" DB is one
>> that exists by default
>> and IS intended to be a user-facing DB.
>> 
>> 
>> On Mon, Apr 11, 2022 at 11:36 PM Yanjing Wang 
>> wrote:
>> 
>>> Hi community,
>>> 
>>> In hive, DEFAULT is a default database, so one can write sql
>>> SELECT * FROM DEFAULT.t
>>> 
>>> In trino, one can write
>>> SELECT * FROM HIVE.DEFAULT.t
>>> 
>>> But now Calcite treat DEFAULT as a keyword which will be resolved to
>>> DEFAULT() function.
>>> 
>>> What should I do to make the parser to support the DEFAULT in the table
>>> identifier?
>>> 
>>> I tried to add the DEFAULT to non-keywords but it would cause DEFAULT()
>>> function fails.
>>> 
>>> Now I must tell the user wrap the DEFAULT using quotes like this
>>> SELECT * FROM HIVE."DEFAULT".t
>>> 
>>> Thanks in advance.
>>> 
> 



Re: [DISCUSS] Should DEFAULT be a keyword

2022-04-12 Thread Julian Hyde
DEFAULT is from the SQL standard. You can use it when invoking UDFs with named 
parameters, e.g.

  SELECT myFun(p1 => 1, p2 => DEFAULT, p3 => 3)
  FROM t

and in an INSERT statement:

  INSERT INTO t (x, y, z) VALUES (1, DEFAULT, 2)

as long as parameter p2 and column y have default values.

So yes, DEFAULT should be a reserved keyword. And Hive screwed up by not 
reading the standard.

If there is a database or schema called DEFAULT then you can quote it, e.g.

  SELECT *
  FROM “DEFAULT”.t

We could consider making DEFAULT non-reserved in the Babel parser.

Julian




> On Apr 12, 2022, at 9:06 AM, Gavin Ray  wrote:
> 
> The hacky way would be to add a grammar clause like this I think:
> 
> | "DEFAULT"
> 
> To the "TableRef2" node in the parser grammar:
> https://github.com/apache/calcite/blob/82dd78a14f6aef2eeec2f9c94978d04b4acc5359/core/src/main/codegen/templates/Parser.jj#L2087-L2276
> 
> Not sure all the implications this would have elsewhere though.
> 
> Also for personal opinion -- the notion of a "DEFAULT" database I'm not
> sure is a universal thing.
> I've never worked with analytical DB's, only Postgres/MySQL -- and in these
> DB's the default database is implicit in your queries.
> 
> IE if I connect to a database called "mydb", all queries are automatically
> prefixed implicitly with "mydb."
> 
> MySQL I don't think has a default database. A brand-new MySQL DB has 0
> databases.
> There is one called "mysql" but that's a system DB, like "pg_catalog" or
> "information_schema".
> 
> Confusingly this is different in Postgres, where the "postgres" DB is one
> that exists by default
> and IS intended to be a user-facing DB.
> 
> 
> On Mon, Apr 11, 2022 at 11:36 PM Yanjing Wang 
> wrote:
> 
>> Hi community,
>> 
>> In hive, DEFAULT is a default database, so one can write sql
>> SELECT * FROM DEFAULT.t
>> 
>> In trino, one can write
>> SELECT * FROM HIVE.DEFAULT.t
>> 
>> But now Calcite treat DEFAULT as a keyword which will be resolved to
>> DEFAULT() function.
>> 
>> What should I do to make the parser to support the DEFAULT in the table
>> identifier?
>> 
>> I tried to add the DEFAULT to non-keywords but it would cause DEFAULT()
>> function fails.
>> 
>> Now I must tell the user wrap the DEFAULT using quotes like this
>> SELECT * FROM HIVE."DEFAULT".t
>> 
>> Thanks in advance.
>> 



Re: Changes to the rule pattern interface

2022-04-12 Thread Thomas Rebele
Hello,

The reasons for the planner rules configuration can be found here:
CALCITE-3923 . See also
the email thread [DISCUSS] Refactor how planner rules are parameterized

.

Cordialement / Best Regards,
*Thomas Rebele, PhD* | R&D Developer | Germany | www.tibco.com


On Tue, Apr 12, 2022 at 6:10 PM Gavin Ray  wrote:

> I don't have any weight behind my opinion or experience,
> but anything that lowers the barrier to entry to Calcite for newcomers is a
> huge win in my mind.
>
> I assume the reason for the changes was because codegen improved
> performance?
>
> Could it make sense to allow both options, the easy/less-performant way for
> people who want to experiment and learn the ropes,
> and the codegen path for productionizing the final rules you come up with?
>
> Or does this make matters worse, trying to support two API's
>
> On Tue, Apr 12, 2022 at 6:25 AM Vladimir Ozerov 
> wrote:
>
> > Hi folks,
> >
> > Rules are an essential part of the Calcite-based query optimizers. A
> > typical optimizer may require dozens of custom rules that are created by
> > extending some Apache Calcite interfaces.
> >
> > During the last two years, there were two major revisions of how rules
> are
> > created:
> >
> >1. In early 1.2x versions, the typical approach was to use
> >RelOptRuleOperand with a set of helper methods in a builder-like
> >pattern.
> >2. Then, we switched to the runtime code generation.
> >3. Finally, we switched to the compile-time code generation with the
> >Immutables framework.
> >
> > Every such change requires the downstream projects to rewrite all their
> > rules. Not only does this require time to understand the new approach,
> but
> > it may also compromise the correctness of the downstream optimizer
> because
> > the regression tracking in query optimizers is not trivial.
> >
> > I had the privilege to try all three approaches, and I cannot get rid of
> > the feeling that every new approach is more complicated than the previous
> > one. I understand that this is a highly subjective statement, but when I
> > just started using Apache Calcite and knew very little about it, I was
> able
> > to write rule patterns by simply looking at the IDE JavaDoc pop-ups and
> > code completion. When the RuleConfig was introduced, every new rule
> always
> > required me to look at some other rule as an example, yet it was doable.
> > Now we also need to configure the project build system to write a single
> > custom rule.
> >
> > At the same time, a significant fraction of the rules are pretty simple.
> > E.g., "operator A on top of operator B". If some additional configuration
> > is required, it could be added via plain rules fields, because at the end
> > of the day the rule instance is not more than a plain Java object.
> >
> > A good example is the FilterProjectTransposeRule. What now takes tens of
> > lines of code in the Config subclass [1] (that you hardly could write
> > without a reference example), and ~500 LOC in the generated code that you
> > get through additional plugin configuration [2] in your build system,
> could
> > have been expressed in a dozen lines of code [3] in Apache Calcite
> 1.22.0.
> >
> > My question is - are we sure we are going in the right direction in terms
> > of complexity and the entry bar for the newcomers? Wouldn't it be better
> to
> > follow the 80/20 rule, when simple rules could be easily created
> > programmatically with no external dependencies, while more advanced
> > facilities like Immutables are used only for the complex rules?
> >
> > Regards,
> > Vladimir.
> >
> > [1]
> >
> >
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L208-L260
> > [2]
> >
> >
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/build.gradle.kts#L215-L224
> > [3]
> >
> >
> https://github.com/apache/calcite/blob/calcite-1.22.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L99-L110
> >
>


Re: Changes to the rule pattern interface

2022-04-12 Thread Gavin Ray
I don't have any weight behind my opinion or experience,
but anything that lowers the barrier to entry to Calcite for newcomers is a
huge win in my mind.

I assume the reason for the changes was because codegen improved
performance?

Could it make sense to allow both options, the easy/less-performant way for
people who want to experiment and learn the ropes,
and the codegen path for productionizing the final rules you come up with?

Or does this make matters worse, trying to support two API's

On Tue, Apr 12, 2022 at 6:25 AM Vladimir Ozerov  wrote:

> Hi folks,
>
> Rules are an essential part of the Calcite-based query optimizers. A
> typical optimizer may require dozens of custom rules that are created by
> extending some Apache Calcite interfaces.
>
> During the last two years, there were two major revisions of how rules are
> created:
>
>1. In early 1.2x versions, the typical approach was to use
>RelOptRuleOperand with a set of helper methods in a builder-like
>pattern.
>2. Then, we switched to the runtime code generation.
>3. Finally, we switched to the compile-time code generation with the
>Immutables framework.
>
> Every such change requires the downstream projects to rewrite all their
> rules. Not only does this require time to understand the new approach, but
> it may also compromise the correctness of the downstream optimizer because
> the regression tracking in query optimizers is not trivial.
>
> I had the privilege to try all three approaches, and I cannot get rid of
> the feeling that every new approach is more complicated than the previous
> one. I understand that this is a highly subjective statement, but when I
> just started using Apache Calcite and knew very little about it, I was able
> to write rule patterns by simply looking at the IDE JavaDoc pop-ups and
> code completion. When the RuleConfig was introduced, every new rule always
> required me to look at some other rule as an example, yet it was doable.
> Now we also need to configure the project build system to write a single
> custom rule.
>
> At the same time, a significant fraction of the rules are pretty simple.
> E.g., "operator A on top of operator B". If some additional configuration
> is required, it could be added via plain rules fields, because at the end
> of the day the rule instance is not more than a plain Java object.
>
> A good example is the FilterProjectTransposeRule. What now takes tens of
> lines of code in the Config subclass [1] (that you hardly could write
> without a reference example), and ~500 LOC in the generated code that you
> get through additional plugin configuration [2] in your build system, could
> have been expressed in a dozen lines of code [3] in Apache Calcite 1.22.0.
>
> My question is - are we sure we are going in the right direction in terms
> of complexity and the entry bar for the newcomers? Wouldn't it be better to
> follow the 80/20 rule, when simple rules could be easily created
> programmatically with no external dependencies, while more advanced
> facilities like Immutables are used only for the complex rules?
>
> Regards,
> Vladimir.
>
> [1]
>
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L208-L260
> [2]
>
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/build.gradle.kts#L215-L224
> [3]
>
> https://github.com/apache/calcite/blob/calcite-1.22.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L99-L110
>


Re: [DISCUSS] Should DEFAULT be a keyword

2022-04-12 Thread Gavin Ray
The hacky way would be to add a grammar clause like this I think:

| "DEFAULT"

To the "TableRef2" node in the parser grammar:
https://github.com/apache/calcite/blob/82dd78a14f6aef2eeec2f9c94978d04b4acc5359/core/src/main/codegen/templates/Parser.jj#L2087-L2276

Not sure all the implications this would have elsewhere though.

Also for personal opinion -- the notion of a "DEFAULT" database I'm not
sure is a universal thing.
I've never worked with analytical DB's, only Postgres/MySQL -- and in these
DB's the default database is implicit in your queries.

IE if I connect to a database called "mydb", all queries are automatically
prefixed implicitly with "mydb."

MySQL I don't think has a default database. A brand-new MySQL DB has 0
databases.
There is one called "mysql" but that's a system DB, like "pg_catalog" or
"information_schema".

Confusingly this is different in Postgres, where the "postgres" DB is one
that exists by default
and IS intended to be a user-facing DB.


On Mon, Apr 11, 2022 at 11:36 PM Yanjing Wang 
wrote:

> Hi community,
>
> In hive, DEFAULT is a default database, so one can write sql
> SELECT * FROM DEFAULT.t
>
> In trino, one can write
> SELECT * FROM HIVE.DEFAULT.t
>
> But now Calcite treat DEFAULT as a keyword which will be resolved to
> DEFAULT() function.
>
> What should I do to make the parser to support the DEFAULT in the table
> identifier?
>
> I tried to add the DEFAULT to non-keywords but it would cause DEFAULT()
> function fails.
>
> Now I must tell the user wrap the DEFAULT using quotes like this
> SELECT * FROM HIVE."DEFAULT".t
>
> Thanks in advance.
>


Changes to the rule pattern interface

2022-04-12 Thread Vladimir Ozerov
Hi folks,

Rules are an essential part of the Calcite-based query optimizers. A
typical optimizer may require dozens of custom rules that are created by
extending some Apache Calcite interfaces.

During the last two years, there were two major revisions of how rules are
created:

   1. In early 1.2x versions, the typical approach was to use
   RelOptRuleOperand with a set of helper methods in a builder-like
   pattern.
   2. Then, we switched to the runtime code generation.
   3. Finally, we switched to the compile-time code generation with the
   Immutables framework.

Every such change requires the downstream projects to rewrite all their
rules. Not only does this require time to understand the new approach, but
it may also compromise the correctness of the downstream optimizer because
the regression tracking in query optimizers is not trivial.

I had the privilege to try all three approaches, and I cannot get rid of
the feeling that every new approach is more complicated than the previous
one. I understand that this is a highly subjective statement, but when I
just started using Apache Calcite and knew very little about it, I was able
to write rule patterns by simply looking at the IDE JavaDoc pop-ups and
code completion. When the RuleConfig was introduced, every new rule always
required me to look at some other rule as an example, yet it was doable.
Now we also need to configure the project build system to write a single
custom rule.

At the same time, a significant fraction of the rules are pretty simple.
E.g., "operator A on top of operator B". If some additional configuration
is required, it could be added via plain rules fields, because at the end
of the day the rule instance is not more than a plain Java object.

A good example is the FilterProjectTransposeRule. What now takes tens of
lines of code in the Config subclass [1] (that you hardly could write
without a reference example), and ~500 LOC in the generated code that you
get through additional plugin configuration [2] in your build system, could
have been expressed in a dozen lines of code [3] in Apache Calcite 1.22.0.

My question is - are we sure we are going in the right direction in terms
of complexity and the entry bar for the newcomers? Wouldn't it be better to
follow the 80/20 rule, when simple rules could be easily created
programmatically with no external dependencies, while more advanced
facilities like Immutables are used only for the complex rules?

Regards,
Vladimir.

[1]
https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L208-L260
[2]
https://github.com/apache/calcite/blob/calcite-1.30.0/core/build.gradle.kts#L215-L224
[3]
https://github.com/apache/calcite/blob/calcite-1.22.0/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L99-L110