Re: How to PARSER the SPARK SQL

2024-05-30 Thread Walaa Eldin Moustafa
Hi Yujia,

You might take inspiration from Coral https://github.com/linkedin/coral. It
is based on Calcite but uses the Hive parser (which is compatible with
Spark SQL) to generate the SQL and Rel nodes. There is a PR that uses the
native Spark parser as well https://github.com/linkedin/coral/pull/339.
Merging it is a work in progress.

Thanks,
Walaa.


On Thu, May 30, 2024 at 9:07 AM Mihai Budiu  wrote:

> The SQL language has several sublanguages: the query language, the data
> definition language, and the data manipulation language. The core of
> Calcite is mostly about the query language, but there are Calcite
> components that deal with the other languages as well (e.g., server, babel).
>
> Both these components also show how the Calcite parser can be customized.
>
> In our project we have also extended the parser. You can see for example
> our PR which does a minimal change to the Calcite parser:
> https://github.com/feldera/feldera/pull/210, it could be a useful
> guideline for your needs. We use maven in our build, so you can see how the
> build has to be structured. The config.fmpp file has some "metadata" about
> how the changes are integrated into the existing parser, while the *.ftl
> files contain actual parser code written in the JavaCC parser generator
> language.
>
> Mihai
> 
> From: 奚钰佳 
> Sent: Wednesday, May 29, 2024 7:01 PM
> To: dev 
> Subject: How to PARSER the SPARK SQL
>
> Hello calcite team,
>
> I am Yujia and I want to parse the spark sql by calcite. But some keywords
> are not supported by calcite.
> Here is my question in stackoverflow:
> https://stackoverflow.com/questions/78547328/how-can-i-parse-the-spark-sql-by-calcite-sqlparser-like-create-temporary-table
>
>
>
> I have two questions:
>
> How to parse the SPARK SQL by calcite?
>
> How can I extend the syntax on demand? likecreate temporary table?
>
> For question 1: Is there any way to parse the spark sql with the parser
> that support the spark sql?
>
>
> For question 2:
> Also, I tried to extend thesyntax with below steps but met the
> strange compile error with the copied Parser.jj file.
> Steps
>
> Copy the Parser.jj file from calcite
>
> addSqlNode SqlCreateTempTable() :...inparserImpls.ftl
>
> add classCreateTempTablethatextends SqlCallto
> define the getOperator, getOperandList, unparse
>
> add related config inconfig.fmpp
>
> mvn generate-sourcesto generate theTestSqlParserImplthat
> defined in the step 4
>
> at step5, it has below error:
> FMPP processing session failed. [ERROR] Caused by:
> freemarker.core.InvalidReferenceException: The following has evaluated to
> null or missing: [ERROR] == default [in template "Parser.jj" at
> line 1124, column 43] [ERROR]  [ERROR]  [ERROR] Tip: If the failing
> expression is known to legally refer to something that's sometimes null or
> missing, either specify a default value like myOptionalVar!myDefault, or
> use <#if myOptionalVar??when-present<#elsewhen-missing (These only cover the last step of the expression; to cover the whole
> expression, use parenthesis: (myOptionalVar.foo)!myDefault,
> (myOptionalVar.foo)?? [ERROR]  [ERROR]  [ERROR]  [ERROR] FTL stack
> trace ("~" means nesting-related): [ERROR] -
> Failed at: #if (parser.createStatementParserMeth... [in template
> "Parser.jj" at line 1124, column 1]
> Is my steps right? And why the same Parser.jj file has error when compile?
>
>
>
>
> Thanks,
>
> Yujia
>


Re: Dynamic Parameters Handling in RelNode

2024-05-02 Thread Walaa Eldin Moustafa
You should be able to do it at the SqlNode stage. You can create a
SqlShuttle that replaces your placeholder table name with another table
name.


Re: Dynamic Parameters Handling in RelNode

2024-05-02 Thread Walaa Eldin Moustafa
Hi Ravi,

RelNodes depend on specific table objects with known schemas. Unless we
know that table's schema there is no way to generate the RelNode in the
first place. Maybe if you know the schema (data types of the table
columns), it is possible to get around the lack of a name, but I do not see
how this works if there is no table object or schema in the first place.

Thanks,
Walaa.


On Thu, May 2, 2024 at 10:23 PM Ravi Kapoor  wrote:

> Hello Team,
> Is there any way in the calcite framework to use runtime variables in Rel
> Node in order to get the same variable back in the translated query?
> Basically I want the same variable to be used back as the runtime parameter
> after query translation.
>
> Here I am talking specifically about using dynamic parameters in
> table/column identifiers and not dynamic rex nodes.
>
> Let's say in Teradata I can write below query
> *TD*:  select employee_id, TRIM(employee_name) from :*table_name*
>
> The translated sql should give back to me the same parameter in the query.
> *say Trim is mapped to LTRIM*
> *BigQuery*: select employee_id, LTRIM(employee_name) from :*table_name*
>
> I believe we can only have static values in schema plus to
> create tables/columns. But what about such cases?
> Need to handle this gracefully with no string manipulation.
>
> --
> Thanks,
> Ravi
>


Re: Can calcite translate Oracle SQL to MySQL

2023-11-27 Thread Walaa Eldin Moustafa
Calcite can do this with some additional effort and Coral
 leverages Calcite to do exactly that,
although it currently supports Hive, Spark, and Trino conversions. There
are public APIs to extend to other dialects on the input and output sides.
There is also a UI to play with the translations and visualize them:
https://github.com/linkedin/coral#coral-service-ui. The intermediate
representation is optionally at AST level (SqlNode) or Logical Plan level
(RelNode).

Thanks,
Walaa.


On Mon, Nov 27, 2023 at 12:02 PM Gavin Ray  wrote:

> If you're looking for a "translate X-to-Y dialect of SQL" tool, jOOQ does
> this fairly well.
>
> What you have to do is parse the string of SQL into a jOOQ query AST, then
> render it using a different SQL dialect.
> You can experiment with the results of this online here:
>
> Format, pretty print, and translate your SQL from one dialect to another
> (jooq.org) 
>
> jOOQ requires a paid license for commercial SQL dialects like Oracle and
> MS-SQL, though.
>
> Hope this helps.
>
> On Sun, Nov 26, 2023 at 4:29 PM Julian Hyde  wrote:
>
> > Creating a high-quality Oracle to MySQL translator is a large task.
> > Calcite doesn't do it. In fact there is only one way to do it: spend a
> > considerable effort writing a large suite of compliance tests. Once
> > you have built that suite, getting the translator to pass that suite
> > is relatively easy.
> >
> > I do believe that Calcite is a very good basis for that translator. We
> > have decomposed Calcite's functionality into areas such as lexical
> > differences (backticks versus double quotes around identifiers),
> > differences in type system (what does the TIMESTAMP type mean?),
> > differences in operator set (is there an NVL function?), and
> > differences in code generator (what syntax for LIMIT-OFFSET does this
> > DBMS support?). The net result is that when you fix a bug in your
> > Oracle-to-MySQL translator, you likely fix a bug in my
> > Oracle-to-Postgres translator, or at least you don't make it worse.
> >
> > Therefore we can pool our efforts. You just need to write that test
> > suite, because no one is going to write it for you.
> >
> > Julian
> >
> >
> > On Fri, Nov 24, 2023 at 3:05 PM Mihai Budiu  wrote:
> > >
> > > This is a complicated question.
> > >
> > > Calcite is a framework with many components which you can assemble to
> > build various translators, including source-to-source translators. But
> some
> > components make program transformations that assume a certain semantics,
> > which may not coincide with either Oracle or MySQL. For example, Calcite
> > assumes that division with zero will cause a runtime exception, while
> MySQL
> > and Oracle both assume that it returns NULL. This is part of the type
> > inference rules of Calcite, and it's a fairly basic component.
> > >
> > > So if you use Calcite for compiling arithmetic expressions with
> division
> > of non-nullable arguments, the result is non-nullable in Calcite, but
> > nullable in either of these frameworks. Moreover, if you use Calcite's
> > optimizer to simplify expressions that divide by 0, you may produce
> > programs that give you different results. (There are ways you can
> override
> > this behavior of Calcite if you really want to)
> > >
> > > This is just one example, but the problem with SQL dialects is that
> they
> > have very subtle different semantics, so it may be actually very
> difficult
> > to translate a program from one database dialect to another while
> > preserving bit-exact results. I am assuming that this is the goal:
> produce
> > the same results given the same input data and queries. If this is not
> the
> > goal, you have to specify what relationship you expect between the
> original
> > program and the translated program.
> > >
> > > There may be a nice well-behaved subset of SQL on which this is
> > possible, but I am very skeptical, since the SQL standard does not even
> > specify the meaning of very basic operations, like how rounding is done
> > when converting decimals to integers.
> > >
> > > Mihai
> > >
> > > 
> > > From: Julian Hyde 
> > > Sent: Friday, November 24, 2023 1:39 PM
> > > To: dev@calcite.apache.org 
> > > Subject: Re: Can calcite translate Oracle SQL to MySQL
> > >
> > > In general, yes. But I don't think there's any code to handle Oracle's
> > > ROWNUM system column. Please log a jira case requesting to translate
> > > ROWNUM.
> > >
> > > Cases https://issues.apache.org/jira/browse/CALCITE-5723 and
> > > https://issues.apache.org/jira/browse/CALCITE-1786 are related, but
> > > not what you are asking for.
> > >
> > > On Fri, Nov 24, 2023 at 12:40 AM Louis Hust 
> > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Can calcite translate oracle sql to mysql ?
> > > >
> > > > Such as oracle: select * from t1 whre rownum < 10
> > > > To  mysql: select * from t1 limit 9;
> >
>


Re: Help adding Snowflake variant syntax

2022-06-02 Thread Walaa Eldin Moustafa
Hi Julian,

Thanks for asking those questions! I was providing this material as an
educational reference since the use case is very similar to the resources I
shared. I found those references to help enrich the Calcite community in
the past. For example, Gavin Ray has written an article [1] on converting
GraphQL to Calcite RelNodes after sharing similar resources, which I think
is a pretty nice contribution to Calcite (while not implemented in either
code base).

I might not be the best person to answer the IP question since there is
dedicated staff to address this type of questions at LinkedIn, but I'm
happy to route that question to the appropriate people.

In addition to the IP aspect, forking might not be productive since Coral
is quickly evolving, especially around its core APIs and workflows to
translate between input and output dialects, but I am sure there are ways
to achieve mutual leverage, starting from sharing knowledge and experiences.

[1] https://datalore.jetbrains.com/view/notebook/JYTVfn90xYSmv6U5f2NIQR

Thanks,
Walaa.


On Wed, Jun 1, 2022 at 12:36 PM Julian Hyde  wrote:

> Let’s suppose that there is code/tests in Coral would allow Calcite to
> implement this feature. Under the terms of Coral’s license (BSD2) some
> COULD copy code from Coral to Calcite, but SHOULD they?
>
> As an Apache project, and as individuals participating in an open source
> community beyond the ASF, we must be respectful of other projects’ IP and
> the “do not fork” principle.
>
> I would love to hear from Walaa or other Coral committers suggestions how
> we can share functionality between the projects.
>
> Julian
>
>
> > On Jun 1, 2022, at 7:47 AM, Walaa Eldin Moustafa 
> wrote:
> >
> > Hi Erik,
> >
> > Coral [1, 2] which is based on Calcite might be a good starting point for
> > this type of work.There are a couple of dialects it supports right now:
> > Hive (or Spark) and Trino (or Presto). You can check out the
> corresponding
> > test cases [3, 4] for these two dialects. coral-common [5] is a module
> that
> > contains all Calcite-related infra required to support any dialect. It is
> > extended by both coral-hive and coral-trino.
> >
> > [1] https://github.com/linkedin/coral
> > [2] https://engineering.linkedin.com/blog/2020/coral
> > [3]
> >
> https://github.com/linkedin/coral/blob/master/coral-hive/src/test/java/com/linkedin/coral/hive/hive2rel/HiveToRelConverterTest.java
> > [4]
> >
> https://github.com/linkedin/coral/blob/master/coral-trino/src/test/java/com/linkedin/coral/trino/trino2rel/TrinoToRelConverterTest.java
> > [5] https://github.com/linkedin/coral/tree/master/coral-common
> >
> > Thanks,
> > Walaa.
> >
> >
> > On Wed, Jun 1, 2022 at 6:44 AM Erik Goldman 
> wrote:
> >
> >> I'm Erik <https://www.linkedin.com/in/erik-goldman/> and I'm currently
> >> working on a project that requires a patch to Calcite as described in
> this
> >> email <
> https://www.mail-archive.com/dev@calcite.apache.org/msg17417.html>
> >> (a
> >> related Jira task is here
> >> <https://issues.apache.org/jira/browse/CALCITE-4919>)
> >>
> >> My current work is here
> >> <
> >>
> https://github.com/ErikGoldman/calcite-snowflake/commits/snowflake-progress
> >>>
> >> but
> >> it needs much more work and understanding of the system to be
> successful.
> >>
> >> I'm very early to Calcite and I think it would be better to work with
> >> someone who understands the system before doing more work on this.
> >>
> >> Would anyone be open to offering some advice or helping with the work?
> I am
> >> happy to pay a competitive hourly rate for the assistance.
> >>
> >> Thank you!
> >>
>
>


Re: Help adding Snowflake variant syntax

2022-06-01 Thread Walaa Eldin Moustafa
Hi Erik,

Coral [1, 2] which is based on Calcite might be a good starting point for
this type of work.There are a couple of dialects it supports right now:
Hive (or Spark) and Trino (or Presto). You can check out the corresponding
test cases [3, 4] for these two dialects. coral-common [5] is a module that
contains all Calcite-related infra required to support any dialect. It is
extended by both coral-hive and coral-trino.

[1] https://github.com/linkedin/coral
[2] https://engineering.linkedin.com/blog/2020/coral
[3]
https://github.com/linkedin/coral/blob/master/coral-hive/src/test/java/com/linkedin/coral/hive/hive2rel/HiveToRelConverterTest.java
[4]
https://github.com/linkedin/coral/blob/master/coral-trino/src/test/java/com/linkedin/coral/trino/trino2rel/TrinoToRelConverterTest.java
[5] https://github.com/linkedin/coral/tree/master/coral-common

Thanks,
Walaa.


On Wed, Jun 1, 2022 at 6:44 AM Erik Goldman  wrote:

> I'm Erik  and I'm currently
> working on a project that requires a patch to Calcite as described in this
> email 
> (a
> related Jira task is here
> )
>
> My current work is here
> <
> https://github.com/ErikGoldman/calcite-snowflake/commits/snowflake-progress
> >
> but
> it needs much more work and understanding of the system to be successful.
>
> I'm very early to Calcite and I think it would be better to work with
> someone who understands the system before doing more work on this.
>
> Would anyone be open to offering some advice or helping with the work? I am
> happy to pay a competitive hourly rate for the assistance.
>
> Thank you!
>


Re: I wrote a guide to building new query language frontends in Calcite, as an interactive Jupyter Notebook (GraphQL used as example)

2022-01-23 Thread Walaa Eldin Moustafa
Nice article indeed. In Coral [1], we have simplified this process to the
end user where those steps can be done through implementing a small number
of abstract methods, listed here [2].

[1] https://github.com/linkedin/coral
[2]
https://github.com/linkedin/coral/blob/master/coral-common/src/main/java/com/linkedin/coral/common/ToRelConverter.java

On Sun, Jan 23, 2022 at 12:59 PM Stamatis Zampetakis 
wrote:

> Thanks for sharing this Gavin, very nice article!
>
> On Sun, Jan 23, 2022 at 7:00 PM Gavin Ray  wrote:
>
> > I thought that someone else might find it useful, since there was no
> > existing guide covering building new frontends.
> > It looks best on Jetbrains Datalore, but you can also view it using the
> > ".ipynb" in the Github repo:
> >
> > https://datalore.jetbrains.com/view/notebook/JYTVfn90xYSmv6U5f2NIQR
> >
> >
> https://github.com/GavinRay97/calcite-new-frontend-tutorial/blob/master/Calcite%20Frontend%20Guide.ipynb
> >
>


Re: Feedback on a generic return type version of RelShuttle?

2021-12-02 Thread Walaa Eldin Moustafa
You are right! Always used RexShuttle directly. To make it parallel, we
could have named the new Rel class RelVisitor, but RelVisitor is already
taken.

On Thu, Dec 2, 2021 at 1:16 AM Ruben Q L  wrote:

> @Walaa Eldin Moustafa , for RexNodes there seems
> to be already this "generic return type visitor pattern" via RexVisitor; in
> fact, RexShuttle is just an implementation of RexVisitor.
>
>
>
>
> On Thu, Dec 2, 2021 at 9:06 AM Walaa Eldin Moustafa 
> wrote:
>
>> +1 while maintaining backward compatibility. Is there a plan for
>> RexShuttle
>> too?
>>
>> On Thu, Dec 2, 2021 at 1:03 AM Ruben Q L  wrote:
>>
>> > Sounds like a good idea, as long as it does not break backwards
>> > compatibility.
>> >
>> > Regards,
>> > Ruben
>> >
>> > On Thu, Dec 2, 2021 at 7:22 AM Jay Narale 
>> wrote:
>> >
>> > > +1  re-writing  to other types would be easier with this
>> > >
>> > > On Thu, Dec 2, 2021 at 16:18 Alessandro Solimando <
>> > > alessandro.solima...@gmail.com> wrote:
>> > >
>> > > > Hi Jacques,
>> > > > I have faced the same issue recently and I think it's a good idea to
>> > > have a
>> > > > generic version of it and have RelShuttle be one of its concrete
>> > > > implementations.
>> > > >
>> > > > Best regards,
>> > > > Alessandro
>> > > >
>> > > >
>> > > > Il Gio 2 Dic 2021, 06:02 Jacques Nadeau  ha
>> > scritto:
>> > > >
>> > > > > Right now, RelNode's accept method and RelShuttle forces a user to
>> > > return
>> > > > > RelNodes. This makes tree traversal/conversion extra painful in
>> main
>> > > > > circumstances (you want to have more specific relnode classes
>> > explicit,
>> > > > you
>> > > > > want to rewrite to non-relnodes, etc). I've worked around this for
>> > many
>> > > > > years but really think it should be resolved inside Calcite. I
>> wanted
>> > > to
>> > > > > get some feedback on introducing a more generic pattern (that
>> > > RelShuttle
>> > > > > becomes a concrete variation of). If people are on board with the
>> > > change,
>> > > > > I'll pull it through the codebase.
>> > > > >
>> > > > > You can see the rough idea in this wip patch:
>> > > > > https://github.com/apache/calcite/pull/2625
>> > > > >
>> > > > > Basically, introduce a new parameterized visitor that people can
>> use.
>> > > > This
>> > > > > doesn't change the existing behavior of RelShuttle (it's now just
>> a
>> > > > > specific version of this generic version) but it does allow other
>> > > rewrite
>> > > > > patterns.
>> > > > >
>> > > > > Thoughts?
>> > > > >
>> > > >
>> > > --
>> > > Warm Regards,
>> > >
>> > > Jay Narale
>> > >
>> >
>>
>


Re: Feedback on a generic return type version of RelShuttle?

2021-12-02 Thread Walaa Eldin Moustafa
+1 while maintaining backward compatibility. Is there a plan for RexShuttle
too?

On Thu, Dec 2, 2021 at 1:03 AM Ruben Q L  wrote:

> Sounds like a good idea, as long as it does not break backwards
> compatibility.
>
> Regards,
> Ruben
>
> On Thu, Dec 2, 2021 at 7:22 AM Jay Narale  wrote:
>
> > +1  re-writing  to other types would be easier with this
> >
> > On Thu, Dec 2, 2021 at 16:18 Alessandro Solimando <
> > alessandro.solima...@gmail.com> wrote:
> >
> > > Hi Jacques,
> > > I have faced the same issue recently and I think it's a good idea to
> > have a
> > > generic version of it and have RelShuttle be one of its concrete
> > > implementations.
> > >
> > > Best regards,
> > > Alessandro
> > >
> > >
> > > Il Gio 2 Dic 2021, 06:02 Jacques Nadeau  ha
> scritto:
> > >
> > > > Right now, RelNode's accept method and RelShuttle forces a user to
> > return
> > > > RelNodes. This makes tree traversal/conversion extra painful in main
> > > > circumstances (you want to have more specific relnode classes
> explicit,
> > > you
> > > > want to rewrite to non-relnodes, etc). I've worked around this for
> many
> > > > years but really think it should be resolved inside Calcite. I wanted
> > to
> > > > get some feedback on introducing a more generic pattern (that
> > RelShuttle
> > > > becomes a concrete variation of). If people are on board with the
> > change,
> > > > I'll pull it through the codebase.
> > > >
> > > > You can see the rough idea in this wip patch:
> > > > https://github.com/apache/calcite/pull/2625
> > > >
> > > > Basically, introduce a new parameterized visitor that people can use.
> > > This
> > > > doesn't change the existing behavior of RelShuttle (it's now just a
> > > > specific version of this generic version) but it does allow other
> > rewrite
> > > > patterns.
> > > >
> > > > Thoughts?
> > > >
> > >
> > --
> > Warm Regards,
> >
> > Jay Narale
> >
>


Re: Federated Query

2021-11-01 Thread Walaa Eldin Moustafa
Coral [1] is based on Calcite and can be used to run HiveQL on Spark and
Presto/Trino. We are in the process of adding Presto/Trino SQL as an input
language so it can run on Hive/Spark as well. You can read more on how it
interacts with Calcite in this article [2].

[1] https://github.com/linkedin/coral
[2] https://engineering.linkedin.com/blog/2020/coral

Thanks,
Walaa.


On Mon, Nov 1, 2021 at 12:38 PM Yogendra Sharma  wrote:

> Hi Team,
>
> I am exploring Apache Calcite to run federated queries on three different
> databases. I could not find a working example anywhere on internet.
>
>  Can you guys confirm if it is possible and if yes, is there an example
> that you can point me to?
>
> Thanks,
> Yogendra
>
>


Re: Using Calcite at LinkedIn

2020-12-12 Thread Walaa Eldin Moustafa
Hi Zhaohui,

Thanks for sharing. Materialized views are crucial to performance
optimization in the data lake. We mentioned that in the blog post too as a
future work. Would love to discuss this further.

Thanks,
Walaa.


On Sat, Dec 12, 2020 at 11:49 AM 953396112 <13282155...@qq.com> wrote:

> Hi Walaa andTao:
>   I am very happy to see your sharing. Our team is also
> working on SQL rewriting, analysis and optimization. Using Calcite's
> materialized view recognition capabilities to speed up user queries, many
> materialized view recognition algorithms[1] and normalization algorithms[2]
> have been implemented, and a lot of work has been done in preprocessing
> available materialized views, which has excellent performance. Welcome to
> discuss issues related to materialized view identification.
>
>
> [1]https://github.com/apache/calcite/pull/2094
> [2] https://github.com/apache/calcite/pull/2262
>
>
> Regards!
> Zhaohui Xu
>
>
> --原始邮件--
> 发件人:
>   "dev"
> <
> taojia...@gmail.com;
> 发送时间:2020年12月12日(星期六) 晚上9:23
> 收件人:"dev"
> 主题:Re: Using Calcite at LinkedIn
>
>
>
> Hi Walaa
> Very happy to see this, our team basically do the same thing, a unified SQL
> layer:
> 1. Spark: RelNode - Spark DataFrame plan
> 2. Presto: RelNode - In string SQL
> 3. Clickhouse: RelNode - Serialized RelNode
> 4. Flink - TBD(with datastream API or table API)
>
>
> I do point 1 both in my previous company and current company, maybe I can
> participate in this part: analyze and translate Spark Catalyst plans.
>
>
> Regards!
>
> Aron Tao
>
>
> Walaa Eldin Moustafa 
>  Hi Calcite community,
> 
>  I wanted to share a recently published LinkedIn's blog series article
> [1]
>  on how Calcite helps us build a smarter data lake using Coral [2].
> Hope you
>  find it interesting. Also, if you want to discuss with our team and
> the
>  data lake + Calcite community, please feel free to join our Coral
> Slack
>  workspace [3].
> 
>  [1] https://engineering.linkedin.com/blog/2020/coral
>  [2] https://github.com/linkedin/coral
>  [3]
> 
> 
> https://join.slack.com/t/coral-sql/shared_invite/zt-j9jw5idg-mkt3fjA~wgoUEMXXZqMr6g
> 
> <https://join.slack.com/t/coral-sql/shared_invite/zt-j9jw5idg-mkt3fjA~wgoUEMXXZqMr6g>
> ;
>  Thanks,
>  Walaa.
> 


Re: Using Calcite at LinkedIn

2020-12-12 Thread Walaa Eldin Moustafa
Hi JiaTao,

That sounds interesting. A few questions:

1- When you go from RelNode to "Spark DataFrame plan", did you mean you go
to:
  * Spark SQL
  * Spark DataFrame Scala code
  * In-memory Spark Catalyst Plan
  * Human readable string representation of Spark plan (e.g., similar to
DataFrame.explain)
  * Some serialization of the in-memory Spark plan (similar to human
readable, but more ser/de friendly without necessarily being human
readable)?

2- In your Presto and Spark conversions mentioned below, you stated you
start from a RelNode. Could you clarify where the RelNode is originally
coming from? What is the use case in both?

3- That would be awesome if you could contribute to coral-spark-plan [1].
Currently its objective is to convert human readable Spark plan (output of
DataFrame.explain) to RelNode. Right now it can do basic conversions (see
test cases [2]).This module can help with:
** Analyzing Spark jobs (we have used it figure out which Spark jobs in our
history server push down complex predicates down, as complex predicates are
not supported on DataSource V2 [3])
** Converting arbitrary Spark logic to other platforms (e.g., Spark
Catalyst plan to Presto), since even Scala code ends up being represented
in the plan string in a structured way.
 ** Converting Spark scala code back to SQL

[1] https://github.com/linkedin/coral/tree/master/coral-spark-plan
[2]
https://github.com/linkedin/coral/blob/master/coral-spark-plan/src/test/java/com/linkedin/coral/sparkplan/SparkPlanToIRRelConverterTest.java
[3]
https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-SparkStrategy-DataSourceStrategy.html

Thanks,
Walaa.


On Sat, Dec 12, 2020 at 5:24 AM JiaTao Tao  wrote:

> Hi Walaa
> Very happy to see this, our team basically do the same thing, a unified SQL
> layer:
> 1. Spark: RelNode -> Spark DataFrame plan
> 2. Presto: RelNode -> In string SQL
> 3. Clickhouse: RelNode -> Serialized RelNode
> 4. Flink -> TBD(with datastream API or table API)
>
>
> I do point 1 both in my previous company and current company, maybe I can
> participate in this part:  analyze and translate Spark Catalyst plans.
>
>
> Regards!
>
> Aron Tao
>
>
> Walaa Eldin Moustafa  于2020年12月12日周六 上午5:34写道:
>
> > Hi Calcite community,
> >
> > I wanted to share a recently published LinkedIn's blog series article [1]
> > on how Calcite helps us build a smarter data lake using Coral [2]. Hope
> you
> > find it interesting. Also, if you want to discuss with our team and the
> > data lake + Calcite community, please feel free to join our Coral Slack
> > workspace [3].
> >
> > [1] https://engineering.linkedin.com/blog/2020/coral
> > [2] https://github.com/linkedin/coral
> > [3]
> >
> >
> https://join.slack.com/t/coral-sql/shared_invite/zt-j9jw5idg-mkt3fjA~wgoUEMXXZqMr6g
> >
> > Thanks,
> > Walaa.
> >
>


Using Calcite at LinkedIn

2020-12-11 Thread Walaa Eldin Moustafa
Hi Calcite community,

I wanted to share a recently published LinkedIn's blog series article [1]
on how Calcite helps us build a smarter data lake using Coral [2]. Hope you
find it interesting. Also, if you want to discuss with our team and the
data lake + Calcite community, please feel free to join our Coral Slack
workspace [3].

[1] https://engineering.linkedin.com/blog/2020/coral
[2] https://github.com/linkedin/coral
[3]
https://join.slack.com/t/coral-sql/shared_invite/zt-j9jw5idg-mkt3fjA~wgoUEMXXZqMr6g

Thanks,
Walaa.


Re: How to trace a column back to its original column

2020-02-18 Thread Walaa Eldin Moustafa
You might check out this class [1].

[1] 
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMdExpressionLineage.java

On Tue, Feb 18, 2020 at 10:31 PM Seliverstov Igor  wrote:
>
> You can use their origins (says where a column came from).
>
> It's accessable from SqlValidatorImpl or jdbc result set
>
> ср, 19 февр. 2020 г., 9:25 JiaTao Tao :
>
> > What I really need is to collect every part of the SQL, Which columns are
> > used as filters, which are used as projection(Columns on the source
> > table), But
> > the existence of nested subqueries complicates the issue.
> >
> >
> > Regards!
> >
> > Aron Tao
> >
> >
> > JiaTao Tao  于2020年2月19日周三 下午2:17写道:
> >
> > > SQL like this:
> > > ```
> > > SELECT T1.NID,
> > >T1.NAME,
> > >T2.COMPANY
> > > FROM
> > >   (SELECT (U.ID +100) AS NID,
> > >   U.NAME AS NAME
> > >FROM USERS U) T1
> > > JOIN
> > >   (SELECT (J.ID +100) AS NID,
> > >   J.COMPANY
> > >FROM JOBS J) T2 ON T1.NID = T2.NID
> > > ```
> > > What I wanted is that T1.NID comes from USER.ID, T1.NAME comes from
> > > USER.NAME.
> > >
> > > Has anyone done similar work? Is there a ready-made example to refer to?
> > >
> > >
> > > Regards!
> > >
> > > Aron Tao
> > >
> >


Re: Talk about Calcite at University of Cergy-Pontoise (France) on 3 March 2020

2020-02-16 Thread Walaa Eldin Moustafa
Will it be recorded? Might share the recording after the talk.

Thanks,
Walaa.

On Sun, Feb 16, 2020 at 5:57 PM Julian Hyde  wrote:

> I tweeted about this talk:
> https://twitter.com/ApacheCalcite/status/1228821315005497344 <
> https://twitter.com/ApacheCalcite/status/1228821315005497344>
>
> > On Feb 15, 2020, at 3:14 AM, Danny Chan  wrote:
> >
> > Thanks for the share, look forward to the talk!
> >
> > Stamatis Zampetakis 于2020年2月15日 周六下午5:39写道:
> >
> >> Hello,
> >>
> >> In case somebody is interested, I will give an introductory talk about
> >> query optimization and Calcite for master students at the University of
> >> Cergy-Pontoise in France, on 3 March 2020 [1].
> >>
> >> The talk is open to everybody. I don't plan to enter into many details
> or
> >> advanced usages of Calcite but it could be a good opportunity to meetup
> and
> >> discuss for more advanced scenarios before/after the event if somebody
> else
> >> is going to join.
> >>
> >> Best,
> >> Stamatis
> >>
> >> [1] https://depinfo.u-cergy.fr/actualites/73-journee-master-iisc
> >>
>
>


Re: [Discuss] (CALCITE-2812) Add algebraic operators to allow expressing recursive queries

2019-04-18 Thread Walaa Eldin Moustafa
Are you considering refactoring the UNION ALL approach?

On Thu, Apr 18, 2019 at 6:04 AM Ruben Q L  wrote:

> Hello,
>
> Stamatis and I have been discussing about this topic, and currently we have
> two main approaches on how to represent the logical plan.
>
> Consider the following basic example: a SQL recursive union query that
> generates numbers 1,2,..10:
>
>WITH RECURSIVE t(n) AS (
>  VALUES (1)
>  UNION ALL
>  SELECT n+1 FROM t WHERE n < 10
>)
>SELECT * FROM t
>
> *Version 1*
>
> LogicalRepeatUnionV1(table="t", maxRep=-1, all=true)
>   LogicalValues({1})
>   LogicalProject($0=$0+1)
> LogicalFilter($0<10)
>   LogicalTableScan(table="t")
>
> The operator LogicalRepeatUnionV1 will be a pure translation of the
> SQL recursive union mechanism (it could be named LogicalRecursiveUnion
> indeed).
> Its behavior (as described e.g. here
> <https://www.postgresql.org/docs/9.1/queries-with.html>) will be:
>   - Evaluate the left input once, place the results in a temporary
> working table ("t" in our example)
>   - Evaluate the right input, saving its results in an intermediate
> location. When no more results are returned, the content of "t" will
> be replaced with the results from the intermediate location, and the
> right input will be re-evaluated again. This process will repeat over
> and over, until it produces no more results (or until an optional
> value maxRep is reached, the default value being -1, i.e. no limit).
>
> *Version 2*
>
> LogicalRepeatUnionV2 (maxRep=-1, all=true)
>   LogicalTableSpool(table="t")
> LogicalValues({1})
>   LogicalTableSpool(table="t")
> LogicalProject($0=$0+1)
>   LogicalFilter($0<10)
> LogicalTableScan(table="t")
>
> The operator LogicalRepeatUnionV2 will perform the following tasks:
>   - Evaluate the left input once.
>   - Evaluate the right input over and over until it produces no more
> results (or until an optional value maxRep is reached, the default
> value being -1, i.e. no limit).
> As we can see, LogicalRepeatUnionV2 will be in charge on the iteration
> only, it will not be responsible for storing the results for the next
> iteration that is needed for the SQL recursive union, this logic will
> be delegated to a LogicalTableSpool.
>
> Both versions will satisfy our requirements in order to implement the
> SQL recursive union, the main difference being:
> - Version1 will produce a simpler and more concise logical plan, but
> the LogicalRepeatUnionV1 operator will only fit this purpose and could
> not be reused in the future in other contexts.
> - Version2 requires a slightly more complicated logical plan, with the
> addition of the table spools. However, its LogicalRepeatUnionV2
> operator is more "purpose-agnostic", and can be reused in the future
> for other plans that would require repetition. For instance, it could
> be used to implement a for loop: let us say that we need to build a
> plan where we must run a certain UserDefinedFunction ten times
> (because each time its results will vary depending on whatever
> factor). A possible way to implement that will be using
> LogicalRepeatUnionV2, with maxRep=10, an empty values as left input
> (no results actually needed from that side) and the
> UserDefinedFunction as right input:
>
> LogicalRepeatUnionV2(maxRep=10, all=true)
>   Values({})
>   UserDefinedFunction(...)
>
> We prefer Version2 that seems to be more useful in the long term, so
> we are ready to go on with this approach if there are no objections
> (keeping in mind that this is still an "experimental" feature).
>
> How does it look?
>
> Best regards,
> Ruben
>
>
>
>
>
> Le mar. 9 avr. 2019 à 01:07, Walaa Eldin Moustafa 
> a
> écrit :
>
> > Stamatis, here is some recent work we did on that topic:
> > http://www.sysnet.ucsd.edu/sysnet/miscpapers/datalog-icbd16.pdf
> >
> > References section has more pointers to other related work.
> >
> > Thanks,
> > Walaa.
> >
> > On Sun, Apr 7, 2019 at 11:15 PM Stamatis Zampetakis 
> > wrote:
> > >
> > > Hello,
> > >
> > > @Julian: It looks like an interesting project, looking forward to see
> > what
> > > comes out of it.
> > >
> > > @Walaa: The current implementation takes care of UNION ALL but it seems
> > > relatively easy to extend it for UNION. At least with a naive
> > > implementation. Research-wise, I am not very familiar with the domain
> so
> > if
> > > you could share a few papers it would be nice :)
>

Re: Help in understanding entities and abstractions in Calcite

2019-04-17 Thread Walaa Eldin Moustafa
You might also read this paper: https://arxiv.org/pdf/1802.10233.pdf
There is a SIGMOD version of it as well.

Thanks,
Walaa.

On Tue, Apr 16, 2019 at 9:22 AM Chunwei Lei  wrote:
>
> Hi, Naveen
>
> You can find some helpful docs in
> https://calcite.apache.org/docs/algebra.html. Wish it can help you.
>
>
> Best,
> Chunwei
>
> Naveen Kumar  于2019年4月16日周二 下午8:05写道:
> >
> > Hi,
> >
> > I am a engineer at Flipkart, we are building SQL over our stream processing
> > platform using Calcite.
> > I am finding it hard to develop intuition for abstractions and entities in
> > Apache Calcite, is there a book or documentation that walks through them?
> >
> > I would love if I could chat with one of you and ask pointed questions.
> >
> > Thanks,
> > Naveen


Re: Join, SemiJoin, Correlate

2019-04-14 Thread Walaa Eldin Moustafa
Agreed, but not sure what would the best way to do it be without
making the code very confusing.

On Sat, Apr 13, 2019 at 2:46 PM Haisheng Yuan  wrote:
>
> I share the same concern with you.
>
>
>
>
>
> Thanks~
> Haisheng 
> Yuan--
> 发件人:Stamatis Zampetakis
> 日 期:2019年04月14日 05:37:29
> 收件人:
> 主 题:Re: Join, SemiJoin, Correlate
>
> Hi Danny,
>
> Thanks a lot for taking this on, it is a great start!
>
> I didn't look thoroughly through the PR but I noticed that there are many
> renaming/refactoring of public APIs. I am not sure if we should introduce
> so many breaking changes without prior notice. A most conservative approach
> would be to keep existing classes/methods, mark them as deprecated, and
> then remove them in one of the coming releases. I am not sure if that is
> the right way to go so let's see what the others have to say.
>
> Best,
> Stamatis
>
> On Fri, Apr 12, 2019 at 9:18 AM Yuzhao Chen  wrote:
>
> > Hi, @Haisheng Yuan, @Julian Hyde, @Stamatis Zampetakis,
> > @Walaa Eldin Moustafa
> >
> > I have did the work for this discussion, and look forward to your
> > suggestions.
> >
> >
> > ### Diff
> > - Deprecate SemiJoin, EquiJoin, EnumerableSemiJoin, SemiJoinType,
> > EnumerableSemiJoinRule, EnumerableThetaJoin
> > - Make EnumerableMergeJoin extends Join instead of EquiJoin
> > - Add SEMI and ANTI join type to JoinRelType, add method
> > returnsJustFirstInput() to decide if the join only outputs left side
> > - Correlate use JoinRelType instead of SemiJoinType
> > - Rename EnumerableCorrelate to EnumerableNestedLoopJoin and make it
> > exptends Join instead of Correlate
> > - Rename EnumerableJoin to EnumerableHashJoin
> > - EnumerableJoinRule will convert semi-join to EnumerableNestedLoopJoin
> > (EnumerableSemiJoin's function is merged into this rule)
> > - Add method isNonCorrelateSemiJoin() in Join.java to make sure if this
> > join is a semi-join (Comes from SemiJoinRule) or comes from
> > decorrelation(SubqueryRemoveRule or RelDecorrelator), the returns value
> > true means the join is a semi-join equivalent to SemiJoin before this patch.
> > - Cache the JoinInfo in Join and use it to get leftKeys and rightKeys,
> > merge the SemiJoin#computeSelfCost to Join#computeSelfCost
> > - RelBuilder removes SemiJoinFactory, method #semiJoin now return a
> > LogicalJoin with JoinRelType#SEMI
> >
> > ### Rules tweak
> > - JoinAddRedundantSemiJoinRule now create LogicalJoin with
> > JoinRelType#SEMI instead of SemiJoin
> > - JoinToCorrelateRule remove SEMI instance and change the matchs condition
> > to !join.getJoinType().generatesNullsOnLeft() which also allowed ANTI
> > compared before this patch.
> > - SemiJoinRule match SEMI join specificlly
> >
> > ### Metadata tweak
> > - RelMdAllPredicates, RelMdExpressionLineage: Add full rowType to
> > getAllPredicates(Join) cause semi-join only outputs one side
> > - RelMdColumnUniqueness, RelMdSelectivity, RelMdDistinctRowCount,
> > RelMdSize, RelMdUniqueKeys: merge semi-join logic to join
> >
> >
> > ### Test cases change
> > - MaterializationTest#testJoinMaterialization11 now can materialize
> > successfully, cause i allow logical SemiJoin node to match, the original
> > matchs SemiJoin as SemiJoin.class.isAssignableFrom(), which i think is
> > wrong cause this will only matches subClasses of SemiJoin which is only
> > EnumerableSemiJoin before this patch.
> > - SortRemoveRuleTest#removeSortOverEnumerableCorrelate, because
> > CALCITE-2018, the final EnumerableSort's cost was cache by the previous
> > EnumerableSort with logical childs, so i remove the EnumerableSortRule and
> > the best plan is correct
> > - sub-query.iq has better plan for null correlate
> >
> >
> >
> > Best,
> > Danny Chan
> > 在 2019年3月21日 +0800 AM3:07,Julian Hyde ,写道:
> > > I just discovered that Correlate, which is neither a Join nor a
> > SemiJoin, uses SemiJoinType, but SemiJoin does not use SemiJoinType.
> > >
> > > Yuck. The Join/SemiJoin/Correlate type hierarchy needs some thought.
> > >
> > > Julian
> > >
> > >
> >
>


Re: [Discuss] (CALCITE-2812) Add algebraic operators to allow expressing recursive queries

2019-04-08 Thread Walaa Eldin Moustafa
Stamatis, here is some recent work we did on that topic:
http://www.sysnet.ucsd.edu/sysnet/miscpapers/datalog-icbd16.pdf

References section has more pointers to other related work.

Thanks,
Walaa.

On Sun, Apr 7, 2019 at 11:15 PM Stamatis Zampetakis  wrote:
>
> Hello,
>
> @Julian: It looks like an interesting project, looking forward to see what
> comes out of it.
>
> @Walaa: The current implementation takes care of UNION ALL but it seems
> relatively easy to extend it for UNION. At least with a naive
> implementation. Research-wise, I am not very familiar with the domain so if
> you could share a few papers it would be nice :)
>
> @Ruben: Thanks for the changes and great work so far. I will have a look
> when I find some time.
>
> Best,
> Stamatis
>
> On Fri, Apr 5, 2019 at 3:48 PM Ruben Q L  wrote:
>
> > Thanks for the feedback.
> > I have pushed <https://github.com/apache/calcite/pull/1020> the latest
> > modifications. As requested, I have flagged all new features as
> > "experimental".
> >
> > Best regards,
> > Ruben
> >
> >
> > Le jeu. 4 avr. 2019 à 20:29, Walaa Eldin Moustafa 
> > a
> > écrit :
> >
> > > +1 to Julian.
> > >
> > > There are open questions on supporting UNION (instead of UNION ALL),
> > > aggregation and negation through recursion. Many of those topics still
> > > have papers written to address their open questions until today. I
> > > think the PR is geared towards addressing the UNION ALL case, and
> > > extending the architecture to cover the more involved cases may be
> > > tricky if we commit to the UNION ALL friendly approach.
> > >
> > > Thanks,
> > > Walaa.
> > >
> > > On Thu, Apr 4, 2019 at 11:23 AM Julian Hyde  wrote:
> > > >
> > > > I guess my concerns would all be met if we could flag this as
> > > “experimental”.
> > > >
> > > > There are more general forms of recursive / deductive queries that I
> > > would like us to be able to tackle someday. If and when we do, I don’t
> > want
> > > to be locked into the structures and nomenclature of this change. But
> > until
> > > then, this change is a huge step forward. If the APIs were labeled
> > > “experimental and subject to change without notice” then we can evolve as
> > > we know more.
> > > >
> > > > By the way, I have personal project where I am experimenting with
> > > melding functional programming with relational algebra. If you have
> > > functions-as-values in relational algebra then you can add a fixed-point
> > > operator and thereby create recursive/deductive queries. You can use
> > > higher-order functions such as map and filter on nested collections, and
> > > exploit the fact that the relational operators (project, filter, join)
> > are
> > > themselves higher-order functions. Rather than adding functional
> > extensions
> > > to a relational language, I started digging the tunnel from the other
> > end:
> > > I started with a small, elegant functional language (ML), wrote an
> > > interpreter in Java, and am extending the language for relational
> > algebra.
> > > If all goes well, there would be some extensions in Calcite’s algebra for
> > > functions-as-values and relational expressions that are defined by
> > > (possibly recursive) functions.
> > > >
> > > > Julian
> > > >
> > > > [1] https://github.com/julianhyde/smlj <
> > > https://github.com/julianhyde/smlj>
> > > >
> > > >
> > > > > On Apr 4, 2019, at 9:31 AM, Stamatis Zampetakis 
> > > wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > The issue has advanced quite a lot and I guess it will be finalised
> > > soon.
> > > > > There is some ongoing discussion about a few naming/refactorings in
> > > the PR
> > > > > [1] but these could be resolved easily.
> > > > >
> > > > > Ruben, had made a proposal which LGTM so I am inclined to merge the
> > PR
> > > as
> > > > > soon as the last changes are made (and no other serious issues
> > appear).
> > > > >
> > > > > I guess it is the right time to jump in to the discusion, especially
> > > if you
> > > > > strongly disagree on the general approach.
> > > > >
> > > > > [1] https://github.com/apache/calcite/pull/1020
> > > > >
> > > > > Best,
> > > > > Stamatis
> > > >
> > >
> >


Re: [Discuss] (CALCITE-2812) Add algebraic operators to allow expressing recursive queries

2019-04-04 Thread Walaa Eldin Moustafa
+1 to Julian.

There are open questions on supporting UNION (instead of UNION ALL),
aggregation and negation through recursion. Many of those topics still
have papers written to address their open questions until today. I
think the PR is geared towards addressing the UNION ALL case, and
extending the architecture to cover the more involved cases may be
tricky if we commit to the UNION ALL friendly approach.

Thanks,
Walaa.

On Thu, Apr 4, 2019 at 11:23 AM Julian Hyde  wrote:
>
> I guess my concerns would all be met if we could flag this as “experimental”.
>
> There are more general forms of recursive / deductive queries that I would 
> like us to be able to tackle someday. If and when we do, I don’t want to be 
> locked into the structures and nomenclature of this change. But until then, 
> this change is a huge step forward. If the APIs were labeled “experimental 
> and subject to change without notice” then we can evolve as we know more.
>
> By the way, I have personal project where I am experimenting with melding 
> functional programming with relational algebra. If you have 
> functions-as-values in relational algebra then you can add a fixed-point 
> operator and thereby create recursive/deductive queries. You can use 
> higher-order functions such as map and filter on nested collections, and 
> exploit the fact that the relational operators (project, filter, join) are 
> themselves higher-order functions. Rather than adding functional extensions 
> to a relational language, I started digging the tunnel from the other end: I 
> started with a small, elegant functional language (ML), wrote an interpreter 
> in Java, and am extending the language for relational algebra. If all goes 
> well, there would be some extensions in Calcite’s algebra for 
> functions-as-values and relational expressions that are defined by (possibly 
> recursive) functions.
>
> Julian
>
> [1] https://github.com/julianhyde/smlj 
>
>
> > On Apr 4, 2019, at 9:31 AM, Stamatis Zampetakis  wrote:
> >
> > Hello,
> >
> > The issue has advanced quite a lot and I guess it will be finalised soon.
> > There is some ongoing discussion about a few naming/refactorings in the PR
> > [1] but these could be resolved easily.
> >
> > Ruben, had made a proposal which LGTM so I am inclined to merge the PR as
> > soon as the last changes are made (and no other serious issues appear).
> >
> > I guess it is the right time to jump in to the discusion, especially if you
> > strongly disagree on the general approach.
> >
> > [1] https://github.com/apache/calcite/pull/1020
> >
> > Best,
> > Stamatis
>


Re: Join, SemiJoin, Correlate

2019-04-01 Thread Walaa Eldin Moustafa
t; > > >
> > > > Thanks ~
> > > > Haisheng Yuan
> > > > --
> > > > 发件人:Stamatis Zampetakis
> > > > 日 期:2019年03月23日 07:31:35
> > > > 收件人:
> > > > 主 题:Re: Join, SemiJoin, Correlate
> > > >
> > > > Since we are discussing this topic I thought it would be could to
> bring
> > > > back
> > > > to the surface a similar discussion [1] that has been done a few
> years
> > > ago
> > > > in this list.
> > > >
> > > > I am leaning towards option 3 where JoinRelType has all necessary
> values:
> > > > Inner, Left, Semi, Anti, and Full.
> > > > With these changes it seems we could remove (deprecate) also
> SemiJoin,
> > > and
> > > > EquiJoin.
> > > >
> > > > On the physical level we could have:
> > > > 1. EnumerableCorrelate or EnumerableNestedLoopJoin;
> > > > 2. EnumerableMergeJoin;
> > > > 3. EnumerableHashJoin (currently EnumerableJoin)
> > > >
> > > > and for the above we could pass the JoinRelType throwing an exception
> > > when
> > > > the specific algorithm cannot be used to implement a specific type of
> > > join.
> > > >
> > > > EnumerableSemiJoin and EnumerableThetaJoin could also be removed and
> > > > covered from the above I think.
> > > >
> > > > Regarding Correlate and LogicalCorrelate, I am not sure what should
> we
> > > do.
> > > > Associating the JoinRelType with it does not seem right, and making
> > > > Correlate also a Join is not very attractive either.
> > > >
> > > > Best,
> > > > Stamatis
> > > >
> > > > [1]
> > > >
> > >
> http://mail-archives.apache.org/mod_mbox/calcite-dev/201411.mbox/%3CCAB%3DJe-H7AWEHbKzjrRHd-YcZgkgWzFORALrz_mMc2k7WDdj54Q%40mail.gmail.com%3E
> > > >
> > > >
> > > > Στις Πέμ, 21 Μαρ 2019 στις 10:35 μ.μ., ο/η Walaa Eldin Moustafa <
> > > > wa.moust...@gmail.com> έγραψε:
> > > >
> > > > > > Is your concern with how we have structured the class hierarchy?
> Or
> > > just
> > > > > how we describe Correlate in the documentation?
> > > > >
> > > > > My concern is with both, but mainly the former.
> > > > >
> > > > > > I do agree that Correlate and nested loops joins are not the
> same (one
> > > > > is logical, the other physical). However, they have a lot in
> common, in
> > > > > particular the fact that one input sets variables and the input
> reads
> > > those
> > > > > variables.
> > > > >
> > > > > I think this commonality describes how the query is written, but
> not
> > > > > necessarily what it is logically equivalent to. It also describes
> the
> > > > > "how", and not necessarily the "what". I would say logical
> > > > > representations should be concerned with the "what" part.
> > > > >
> > > > > > I can’t think of any way to represent a nested loops join (e.g.
> for
> > > each
> > > > > department, find all employees in that department) that does not
> use
> > > > > variables to tie together the two inputs. And therefore I am happy
> with
> > > the
> > > > > fact that our Java implementation of nested-loops join is ‘class
> > > > > EnumerableCorrelate extends Correlate’.
> > > > >
> > > > > That is correct. The two variables are required. At the logical
> level
> > > > > they are mapped to the Correlate variables, or the Join keys after
> > > > > decorrelation. After going to physical, we can only have join keys.
> > > > > One of the keys can be the basis for the outer loop and the other
> for
> > > > > the inner loop if needed. That is true for both Correlate and Join
> > > > > operators. Both keys can even be used in another way than forming
> > > > > nested loops such as using them to implement hash or merge joins
> > > > > (again for regular Join or Correlate join after decorrelation).
> > > > >
> > > > > Thanks,
> > > > > Walaa.
> > > > >
> > > > > On Thu, Mar 21, 2019 at 2:08 PM Julian Hyde 
&

Re: Join, SemiJoin, Correlate

2019-03-21 Thread Walaa Eldin Moustafa
> Is your concern with how we have structured the class hierarchy? Or just how 
> we describe Correlate in the documentation?

My concern is with both, but mainly the former.

> I do agree that Correlate and nested loops joins are not the same (one is 
> logical, the other physical). However, they have a lot in common, in 
> particular the fact that one input sets variables and the input reads those 
> variables.

I think this commonality describes how the query is written, but not
necessarily what it is logically equivalent to. It also describes the
"how", and not necessarily the "what". I would say logical
representations should be concerned with the "what" part.

> I can’t think of any way to represent a nested loops join (e.g. for each 
> department, find all employees in that department) that does not use 
> variables to tie together the two inputs. And therefore I am happy with the 
> fact that our Java implementation of nested-loops join is ‘class 
> EnumerableCorrelate extends Correlate’.

That is correct. The two variables are required. At the logical level
they are mapped to the Correlate variables, or the Join keys after
decorrelation. After going to physical, we can only have join keys.
One of the keys can be the basis for the outer loop and the other for
the inner loop if needed. That is true for both Correlate and Join
operators. Both keys can even be used in another way than forming
nested loops such as using them to implement hash or merge joins
(again for regular Join or Correlate join after decorrelation).

Thanks,
Walaa.

On Thu, Mar 21, 2019 at 2:08 PM Julian Hyde  wrote:
>
> > In addition, I would not present Correlate
> > as a nested loops join.
>
>
> Is your concern with how we have structured the class hierarchy? Or just how 
> we describe Correlate in the documentation?
>
> I do agree that Correlate and nested loops joins are not the same (one is 
> logical, the other physical). However, they have a lot in common, in 
> particular the fact that one input sets variables and the input reads those 
> variables.
>
> I can’t think of any way to represent a nested loops join (e.g. for each 
> department, find all employees in that department) that does not use 
> variables to tie together the two inputs. And therefore I am happy with the 
> fact that our Java implementation of nested-loops join is ‘class 
> EnumerableCorrelate extends Correlate’.
>
>
> Julian
>
> > On Mar 21, 2019, at 1:12 PM, Walaa Eldin Moustafa  
> > wrote:
> >
> > I would vote for number 3. In addition, I would not present Correlate
> > as a nested loops join. Moreover, nested loops, hash and merge joins
> > should be able to map to both Join or Correlate logical ones when
> > possible (no inherent correlation between logical join type and
> > physical types).
> >
> > On Thu, Mar 21, 2019 at 11:55 AM Julian Hyde  wrote:
> >>
> >> I have a few ideas for refactorings. (I’m not convinced by any of them, 
> >> but let me know which you like.)
> >>
> >> 1. Get rid of SemiJoinType. It is mis-named (it is not used by SemiJoin, 
> >> it is used by Correlate, but in a field called joinType).
> >>
> >> 2. In Correlate, use org.apache.calcite.linq4j.CorrelateJoinType. It has 
> >> the same set of values as SemiJoinType, but it has a better name.
> >>
> >> 3. Get rid of both SemiJoinType and CorrelateJoinType, and use JoinRelType 
> >> for everything. We would have to add SEMI and ANTI values. Also some 
> >> methods to find out whether the resulting row type contains fields from 
> >> the left and right inputs or just the left input.
> >>
> >> 4. Add “interface JoinLike extends BiRel” and make Join, SemiJoin and 
> >> Correlate implement it. It would have a methods that say whether the LHS 
> >> and RHS generate nulls, and whether the output row type contains columns 
> >> from the right input. This seems attractive because it lets Join, SemiJoin 
> >> and Correlate continue to be structurally different.
> >>
> >> Julian
> >>
> >>
> >>
> >>
> >>> On Mar 20, 2019, at 6:55 PM, Haisheng Yuan  wrote:
> >>>
> >>> SubPlan (in Postgres’ term) is a Postgres physical relational node to 
> >>> evaluate correlated subquery. What I mean is correlated subquery that 
> >>> can’t be decorrelated can’t be implemented by hashjoin or mergejoin. But 
> >>> it is off topic.
> >>>
> >>> Thanks ~
> >>> Haisheng Yuan
> >>> --
> >>> 

Re: Join, SemiJoin, Correlate

2019-03-21 Thread Walaa Eldin Moustafa
I would vote for number 3. In addition, I would not present Correlate
as a nested loops join. Moreover, nested loops, hash and merge joins
should be able to map to both Join or Correlate logical ones when
possible (no inherent correlation between logical join type and
physical types).

On Thu, Mar 21, 2019 at 11:55 AM Julian Hyde  wrote:
>
> I have a few ideas for refactorings. (I’m not convinced by any of them, but 
> let me know which you like.)
>
> 1. Get rid of SemiJoinType. It is mis-named (it is not used by SemiJoin, it 
> is used by Correlate, but in a field called joinType).
>
> 2. In Correlate, use org.apache.calcite.linq4j.CorrelateJoinType. It has the 
> same set of values as SemiJoinType, but it has a better name.
>
> 3. Get rid of both SemiJoinType and CorrelateJoinType, and use JoinRelType 
> for everything. We would have to add SEMI and ANTI values. Also some methods 
> to find out whether the resulting row type contains fields from the left and 
> right inputs or just the left input.
>
> 4. Add “interface JoinLike extends BiRel” and make Join, SemiJoin and 
> Correlate implement it. It would have a methods that say whether the LHS and 
> RHS generate nulls, and whether the output row type contains columns from the 
> right input. This seems attractive because it lets Join, SemiJoin and 
> Correlate continue to be structurally different.
>
> Julian
>
>
>
>
> > On Mar 20, 2019, at 6:55 PM, Haisheng Yuan  wrote:
> >
> > SubPlan (in Postgres’ term) is a Postgres physical relational node to 
> > evaluate correlated subquery. What I mean is correlated subquery that can’t 
> > be decorrelated can’t be implemented by hashjoin or mergejoin. But it is 
> > off topic.
> >
> > Thanks ~
> > Haisheng Yuan
> > --
> > 发件人:Walaa Eldin Moustafa
> > 日 期:2019年03月21日 09:31:41
> > 收件人:
> > 抄 送:Stamatis Zampetakis
> > 主 题:Re: Re: Join, SemiJoin, Correlate
> >
> > Agreed with Stamatis. Currently: 1) Correlate is tied to IN, EXISTS,
> > NOT IN, NOT EXISTS, and 2) is used as an equivalent to nested loops
> > join. The issues here are: 1) IN, EXISTS, NOT IN, NOT EXISTS can be
> > rewritten as semi/anti joins, and 2) nested loops join is more of a
> > physical operator.
> >
> > It seems that the minimal set of logical join types are INNER, LEFT,
> > RIGHT, OUTER, SEMI, ANTI.
> >
> > So I think Calciate could have one LogicalJoin operator with an
> > attribute to specify the join type (from the above), and a number of
> > physical join operators (hash, merge, nested loops) whose
> > implementation details depend on the the join type.
> >
> > What we lose by this model is the structure of the query (whether
> > there was a sub-plan or not), but I would say that this is actually
> > what is desired from a logical representation -- to abstract away from
> > how the query is written, and how it is structured, as long as there
> > is a canonical representation. There could also be a world where both
> > models coexist (Correlate first then Decorrelate but in the light of a
> > single logical join operator?).
> >
> > @Haisheng, generally, a sub-plan can also be implemented using a
> > variant of hash or merge joins as long as we evaluate the sub-plan
> > independently (without the join predicate), but that is up to the
> > optimizer.
> >
> > Thanks,
> > Walaa.
> >
> > On Wed, Mar 20, 2019 at 5:23 PM Haisheng Yuan  
> > wrote:
> >>
> >> SemiJoinType and its relationship with JoinRelType do confuse me a little 
> >> bit.
> >>
> >> But I don’t think we should not have LogicalCorrelate. It is useful to 
> >> represent the lateral or correlated subquery (aka SubPlan in Postgres 
> >> jargon). The LogicalCorrelate can be implemented as NestLoopJoin in 
> >> Calcite, or SubPlan in Postgres’s terminology, but it can’t be implemented 
> >> as HashJoin or MergeJoin.
> >>
> >> Thanks ~
> >> Haisheng Yuan
> >> --
> >> 发件人:Stamatis Zampetakis
> >> 日 期:2019年03月21日 07:13:15
> >> 收件人:
> >> 主 题:Re: Join, SemiJoin, Correlate
> >>
> >> I have bumped into this quite a few times and I think we should really try
> >> to improve the design of the join hierarchy.
> >>
> >> From a logical point of view I think it makes sense to have the following
> >> operators:
> >> InnerJoin, LeftOuterJoin, FullOuterJoin, SemiJoin, AntiJoin, (GroupJoin)

Re: Re: Join, SemiJoin, Correlate

2019-03-20 Thread Walaa Eldin Moustafa
Agreed with Stamatis. Currently: 1) Correlate is tied to IN, EXISTS,
NOT IN, NOT EXISTS, and 2) is used as an equivalent to nested loops
join. The issues here are: 1) IN, EXISTS, NOT IN, NOT EXISTS can be
rewritten as semi/anti joins, and 2) nested loops join is more of a
physical operator.

It seems that the minimal set of logical join types are INNER, LEFT,
RIGHT, OUTER, SEMI, ANTI.

So I think Calciate could have one LogicalJoin operator with an
attribute to specify the join type (from the above), and a number of
physical join operators (hash, merge, nested loops) whose
implementation details depend on the the join type.

What we lose by this model is the structure of the query (whether
there was a sub-plan or not), but I would say that this is actually
what is desired from a logical representation -- to abstract away from
how the query is written, and how it is structured, as long as there
is a canonical representation. There could also be a world where both
models coexist (Correlate first then Decorrelate but in the light of a
single logical join operator?).

@Haisheng, generally, a sub-plan can also be implemented using a
variant of hash or merge joins as long as we evaluate the sub-plan
independently (without the join predicate), but that is up to the
optimizer.

Thanks,
Walaa.

On Wed, Mar 20, 2019 at 5:23 PM Haisheng Yuan  wrote:
>
> SemiJoinType and its relationship with JoinRelType do confuse me a little bit.
>
> But I don’t think we should not have LogicalCorrelate. It is useful to 
> represent the lateral or correlated subquery (aka SubPlan in Postgres 
> jargon). The LogicalCorrelate can be implemented as NestLoopJoin in Calcite, 
> or SubPlan in Postgres’s terminology, but it can’t be implemented as HashJoin 
> or MergeJoin.
>
> Thanks ~
> Haisheng Yuan
> --
> 发件人:Stamatis Zampetakis
> 日 期:2019年03月21日 07:13:15
> 收件人:
> 主 题:Re: Join, SemiJoin, Correlate
>
> I have bumped into this quite a few times and I think we should really try
> to improve the design of the join hierarchy.
>
> From a logical point of view I think it makes sense to have the following
> operators:
> InnerJoin, LeftOuterJoin, FullOuterJoin, SemiJoin, AntiJoin, (GroupJoin)
>
> Yet I have not thought thoroughly what should become a class, and what a
> property of the class (e.g., JoinRelType, SemiJoinType).
>
> Moreover, Correlate as it is right now, is basically a nested loop join (as
> its Javadoc also indicates).
> Nested loop join is most often encountered as a physical operator so I am
> not sure if it should remain as is (in particular the LogicalCorrelate).
> As we do not have HashJoin, MergeJoin, etc., operators at the logical
> level, I think we should not have a NestedLoopJoin (aka., LogicalCorrelate).
> There are valid reasons why Correlate was introduced in the first place but
> I think we should rethink a bit the design and the needs.
>
> @Julian: I do not know to what extend you would like to rethink the
> hierarchy but I have the impression that even small changes can easily
> break backward compatibility.
>
>
> Στις Τετ, 20 Μαρ 2019 στις 8:07 μ.μ., ο/η Julian Hyde 
> έγραψε:
>
> > I just discovered that Correlate, which is neither a Join nor a SemiJoin,
> > uses SemiJoinType, but SemiJoin does not use SemiJoinType.
> >
> > Yuck. The Join/SemiJoin/Correlate type hierarchy needs some thought.
> >
> > Julian
> >
> >
> >
>


Re: Ordering table scans

2019-02-22 Thread Walaa Eldin Moustafa
Also, since you are particularly interested in applying filters from
one table on another, that is actually a nested loops join rather than
a filter. Linq4j implementation has examples of nested loops join in
the CorrelateJoinTest class.

Thanks,
Walaa.


Re: Ordering table scans

2019-02-22 Thread Walaa Eldin Moustafa
Hi Gopal,

You may take a look at the FilterableTable API. It allows filtering tables
based on a list of RexNodes. Check
ScannableTableTest.testFilterableTableCoopertive() for an example test.

Thanks,
Walaa.


On Fri, Feb 22, 2019 at 12:37 PM Gopalakrishna Holla 
wrote:

> Hello,
>
> The problem I'm working on is to order table scans so that one scan can
> insert filters into the another. It seems like it should be a common
> problem so I'm sure I'm missing some context. Taking a concrete example,
> imagine two tables Users and Movies with the following schema
>
> User {
>   id: long
>   name: String
> }
>
> Movie {
>   id : long
>   name: String
>   director_user_id : long
> }
>
> movie.director_user_id is a foreign key to user.id
>
> So for this query,
>
> SELECT m.name as movieName, u.name as directorName
> FROM Movie as m
> JOIN User as u
> ON m.director_user_id = u.id
> WHERE m.id = 1;
>
> Ideally, I'd like the table scan of Movie to go first, retrieve the
> director_user_id and then inject the value(s) as a filter into the scan for
> User (instead of doing a full table scan on User).
>
> My question is
> - Does Calcite provide an inbuilt way to do this?
> - If not, could you help me by pointing to any implementations that do this
> so that I can learn from them and apply it to my use case?
>
> I'v tried to search in stack overflow and the dev archives but haven't been
> able to find something that can help. Any guidance is much appreciated !
>
> Cheers,
> Gopal
>


Re: Calcite example code

2018-12-19 Thread Walaa Eldin Moustafa
Michael,

I would definitely encourage you and anyone else to participate. That is an
awesome idea, especially sometimes there are multiple ways to do the same
thing. For example, instantiating a RelNode converter can be done by the
planner or by constructing everything by hand. Similarly adding schemas can
be done programmatically or by model files. Pointing out the situations of
when to use each solution would be a great addition to this.

Thanks,
Walaa.

On Wed, Dec 19, 2018 at 10:53 AM Michael Mior  wrote:

> After seeing so many people ask for example code to do certain basic things
> in Calcite, I've been trying to find a good literate programming solution
> for Java as I like this approach for demoing. I recently came across the
> IJava (https://github.com/SpencerPark/IJava) kernel for Jupyter notebooks.
>
> This is basically just a proof of concept at this point, but here's a
> simple example
>
>
> https://github.com/michaelmior/calcite-notebooks/blob/master/Query%20parsing.ipynb
>
> I'm curious what others think of this approach. If others think it would be
> useful, I'd be happy to take suggestions on what should be included.
> Eventually, I'd like to get CI set up for this repository so I can re-run
> the notebooks at will. I would then aim to check this on every release so
> we can have a repository of code samples which we know run correctly.
>
> --
> Michael Mior
> mm...@apache.org
>


Re: WITH RECURSIVE implementation

2018-11-17 Thread Walaa Eldin Moustafa
There was a relevant discussion some months ago in the Calcite dev mailing
list. Checkout the "Recursive query, graph query, Datalog" discussion.

I guess representationally, extension should be easy. A plan becomes a
graph instead of a tree. Cycles in the graph mean recursion. The tricky
part is that recursive queries have an execution aspect to them, where
evaluation ends when no new facts (data) can be inferred. That has to come
from the execution engine.

Also, interesting questions arise when you start incorporating recursion
such as semi-naive evaluation, and negation and aggregation, whose
semantics have to be monotonic in order for the evaluation to reach a fixed
point. See [1, 2] for more details.

I have started prototyping a Datalog parser, and RelConverter for Calcite
but it does not support recursion yet. Will be happy to brainstorm or
collaborate if someone is interested.

[1] Walaa Eldin Moustafa, Vicky Papavasileiou, Ken Yocum and Alin
Deutsch. Datalography: Scaling Datalog Graph Analytics on Graph Processing
Systems. ICDE Big Data 2016.
http://www.sysnet.ucsd.edu/sysnet/miscpapers/datalog-icbd16.pdf

[2] Jiwon Seo, Stephen Guo, Monica S. Lam. SociaLite: Datalog Extensions
for Efficient Social Network Analysis.
https://mobisocial.stanford.edu/papers/icde13.pdf

Thanks,
Walaa.


On Sat, Nov 17, 2018 at 4:43 PM Michael Mior  wrote:

> I'm curious if anyone has any thoughts on how to go about implementing
> recursive queries. Postgres seems to only allow recursive queries with the
> form "non-recursive-term UNION [ALL] recursive-term" and uses a separate
> "recursive union" operator.
>
> Costing is also certainly a challenge since there's no obvious way to know
> how many times the recursive query will need to run. Specific suggestions
> would probably be best placed on the corresponding issue. Thanks!
>
> https://issues.apache.org/jira/browse/CALCITE-129
>
> --
> Michael Mior
> mm...@apache.org
>


Re: Gandiva

2018-06-30 Thread Walaa Eldin Moustafa
Hi Julian and Masayuki,

This indeed sounds quite important. Masayuki, thanks for taking the
initiative. I would like to do I what I can to help. I can help with
writing some of the operators, UDFs/UDF APIs, and integration with Calcite.

Thanks,
Walaa.


On Fri, Jun 29, 2018 at 11:40 AM Julian Hyde  wrote:

> We already have two JIRA cases for Arrow integration:
> https://issues.apache.org/jira/browse/CALCITE-2040 and
> https://issues.apache.org/jira/browse/CALCITE-2173.
>
> I think this is an extremely important area of work for the Calcite
> project, because it helps us realize the vision of a deconstructed
> database[1]. There is a lot of work to do, much of it very interesting
> (e.g. writing a thread scheduler, IPC mechanisms, and algorithms for
> sort, join and aggregation that work effectively on Arrow data
> structures).
>
> If you want to help Masayuki, please step up!
>
> Julian
>
> [1]
> https://www.slideshare.net/julienledem/from-flat-files-to-deconstructed-database
>
> On Thu, Jun 28, 2018 at 2:24 PM, Michael Mior  wrote:
> > That's great! If you could create a JIRA case to track your progress,
> that
> > would be helpful for others who might want to follow along or contribute.
> > Thanks!
> >
> > --
> > Michael Mior
> > mm...@apache.org
> >
> >
> >
> > Le mar. 26 juin 2018 à 10:36, Masayuki Takahashi 
> a
> > écrit :
> >
> >> Hi Julian,
> >>
> >> > Masayuki Takahashi has started to develop an Arrow adapter for
> >> Calcite[2], but a lot of work remains to implement all SQL built-in
> >> functions and basic relational operators. Building on top of Gandiva we
> >> could save a lot of this effort.
> >>
> >> I will start to build Gandiva development environment and try to
> >> consider a way to incorporate.
> >>
> >> thanks.
> >>
> >>
> >>
> >> 2018年6月23日(土) 3:54 Julian Hyde :
> >> >
> >> > Suppose a company wishes to build a graph database using their own
> >> innovative graph index data structure. They nevertheless need to
> implement
> >> core relational algebra, core data types, and core built-in functions
> (+,
> >> CASE, SUM, SUBSTRING). And they want to implement these on a
> >> memory-efficient data structure (tens of thousands of rows, stored
> >> column-oriented, per memory block). This is a massive effort.
> >> >
> >> > With Calcite+Gandiva+Arrow they just need to create a sequence of
> >> relational operators (using RelBuilder, say) and efficient machine code
> is
> >> generated. They can then start adding their own data types, built-in
> >> functions, and relational operators, using the same architecture.
> >> >
> >> > Julian
> >> >
> >> >
> >> > > On Jun 22, 2018, at 11:33 AM, Xiening Dai 
> wrote:
> >> > >
> >> > > I was in a talk regarding Gandiva yesterday. Impressive work!
> >> > >
> >> > > But I am not sure why Calcite would like to integrate with it. To me
> >> Gandiva is on execution side, in which scenarios a query planner would
> need
> >> a arrow engine? I read the original Jira about implementing file
> >> enumerator, but the intent is still not clear to me. Would appreciate if
> >> you can elaborate. Thanks.
> >> > >
> >> > >
> >> > >> On Jun 22, 2018, at 11:20 AM, Julian Hyde 
> wrote:
> >> > >>
> >> > >> There is a discussion on dev@arrow about Gandiva, a kernel for
> >> Arrow[1].
> >> > >>
> >> > >> I think it would be an interesting library on which to build our
> >> Arrow engine. (Without a kernel, Arrow is just a data format, but with
> >> Gandiva it becomes an engine upon which we can implement all relational
> >> operations, albeit on a multi-threaded single node. Potentially this
> >> approach can process each row in a few machine cycles, i.e. billions of
> >> records per second. Therefore single-node would be sufficient for many
> >> queries.)
> >> > >>
> >> > >> Masayuki Takahashi has started to develop an Arrow adapter for
> >> Calcite[2], but a lot of work remains to implement all SQL built-in
> >> functions and basic relational operators. Building on top of Gandiva we
> >> could save a lot of this effort.
> >> > >>
> >> > >> Julian
> >> > >>
> >> > >> [1]
> >>
> https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E
> >> <
> >>
> https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E
> >> >
> >> > >>
> >> > >> [2] https://issues.apache.org/jira/browse/CALCITE-2173 <
> >> https://issues.apache.org/jira/browse/CALCITE-2173>
> >> > >
> >> >
> >>
> >>
> >> --
> >> 高橋 真之
> >>
>


Re: Linq4j-to-Rel converter?

2018-03-14 Thread Walaa Eldin Moustafa
Thanks Shuyi! Yeah, I thought you meant Calcite-based projects. Also, there
is Quill: http://getquill.io/.

Thanks,
Walaa.


On Wed, Mar 14, 2018 at 4:32 PM, Shuyi Chen <suez1...@gmail.com> wrote:

> @walaa, AFAIK, both Flink and Spark offer LINQ style API, other than SQL.
>
> @julian, QueryableRelBuilder currently are mostly unimplemented, I think a
> good starting point would be to add unittests. But I think LINQ interface
> would be a good addition to current SQL interface, and should receive some
> love.
>
> Also, here is a good read <https://www.linqpad.net/WhyLINQBeatsSQL.aspx>
> on
> LINQ vs SQL.
>
> On Wed, Mar 14, 2018 at 4:09 PM, Julian Hyde <jh...@apache.org> wrote:
>
> > QueryableRelBuilder can handle translating simple Linq4j expressions
> > to RelNodes. E.g. QueryableRelBuilder.where creates a LogicalFilter.
> >
> > In theory you ought to be able to invoke it via
> > CalciteConnectionImpl.executeQuery(Queryable). I thought there were
> > tests, but I can't find any.
> >
> >
> > On Wed, Mar 14, 2018 at 3:10 PM, Michael Mior <mm...@uwaterloo.ca>
> wrote:
> > > Others may correct me, but I don't believe this is currently supported.
> > We
> > > may not be able to support all of Linq4j, but I'd imagine a usable
> subset
> > > would be possible. If you're interested in implementing, that would be
> > > great! Either way, I'd suggest you file a JIRA case for this. Thanks!
> > >
> > > --
> > > Michael Mior
> > > mm...@apache.org
> > >
> > > 2018-03-14 2:27 GMT-04:00 Walaa Eldin Moustafa <wa.moust...@gmail.com
> >:
> > >
> > >> Does Calcite currently support converting a Linq4j expression to
> > relational
> > >> algebra? After a quick look at the code, it seems that it does not, so
> > if
> > >> not, do you think it is feasible? On one hand, the Linq4j expressions
> > are
> > >> very similar to the RelBuilder structure, (e.g., .groupBy(),
> .orderBy(),
> > >> etc). On the other hand, the extensive use of functions (e.g.,
> > selectors)
> > >> and in-memory data-types may be difficult to implement as a
> RelBuilder.
> > The
> > >> use case I have in mind is to have Linq4j as another top-level
> interface
> > >> (that is interpretable to relational algebra) to Calcite in addition
> to
> > SQL
> > >> and Piglet.
> > >>
> > >> Thanks,
> > >> Walaa.
> > >>
> >
>
>
>
> --
> "So you have to trust that the dots will somehow connect in your future."
>


Re: Linq4j-to-Rel converter?

2018-03-14 Thread Walaa Eldin Moustafa
Awesome! I particularly like the fact that there is a Rex translator from
FunctionExpressions! The QueryableRelBuilder class has lots of
UnsupportedOperationExceptions though. Lots of opportunity :)

Thanks,
Walaa.


On Wed, Mar 14, 2018 at 4:09 PM, Julian Hyde <jh...@apache.org> wrote:

> QueryableRelBuilder can handle translating simple Linq4j expressions
> to RelNodes. E.g. QueryableRelBuilder.where creates a LogicalFilter.
>
> In theory you ought to be able to invoke it via
> CalciteConnectionImpl.executeQuery(Queryable). I thought there were
> tests, but I can't find any.
>
>
> On Wed, Mar 14, 2018 at 3:10 PM, Michael Mior <mm...@uwaterloo.ca> wrote:
> > Others may correct me, but I don't believe this is currently supported.
> We
> > may not be able to support all of Linq4j, but I'd imagine a usable subset
> > would be possible. If you're interested in implementing, that would be
> > great! Either way, I'd suggest you file a JIRA case for this. Thanks!
> >
> > --
> > Michael Mior
> > mm...@apache.org
> >
> > 2018-03-14 2:27 GMT-04:00 Walaa Eldin Moustafa <wa.moust...@gmail.com>:
> >
> >> Does Calcite currently support converting a Linq4j expression to
> relational
> >> algebra? After a quick look at the code, it seems that it does not, so
> if
> >> not, do you think it is feasible? On one hand, the Linq4j expressions
> are
> >> very similar to the RelBuilder structure, (e.g., .groupBy(), .orderBy(),
> >> etc). On the other hand, the extensive use of functions (e.g.,
> selectors)
> >> and in-memory data-types may be difficult to implement as a RelBuilder.
> The
> >> use case I have in mind is to have Linq4j as another top-level interface
> >> (that is interpretable to relational algebra) to Calcite in addition to
> SQL
> >> and Piglet.
> >>
> >> Thanks,
> >> Walaa.
> >>
>


Re: Linq4j-to-Rel converter?

2018-03-14 Thread Walaa Eldin Moustafa
Interesting. Could you share some pointers?

On Wed, Mar 14, 2018 at 1:24 AM, Shuyi Chen <suez1...@gmail.com> wrote:

> I think that's a good idea. There are other open source projects that offer
> both Calcite SQL and LINQ style API, and convert them to Calcite relational
> algebra for query planning and optimization.
>
> On Tue, Mar 13, 2018 at 11:27 PM, Walaa Eldin Moustafa <
> wa.moust...@gmail.com> wrote:
>
> > Does Calcite currently support converting a Linq4j expression to
> relational
> > algebra? After a quick look at the code, it seems that it does not, so if
> > not, do you think it is feasible? On one hand, the Linq4j expressions are
> > very similar to the RelBuilder structure, (e.g., .groupBy(), .orderBy(),
> > etc). On the other hand, the extensive use of functions (e.g., selectors)
> > and in-memory data-types may be difficult to implement as a RelBuilder.
> The
> > use case I have in mind is to have Linq4j as another top-level interface
> > (that is interpretable to relational algebra) to Calcite in addition to
> SQL
> > and Piglet.
> >
> > Thanks,
> > Walaa.
> >
>
>
>
> --
> "So you have to trust that the dots will somehow connect in your future."
>


Linq4j-to-Rel converter?

2018-03-14 Thread Walaa Eldin Moustafa
Does Calcite currently support converting a Linq4j expression to relational
algebra? After a quick look at the code, it seems that it does not, so if
not, do you think it is feasible? On one hand, the Linq4j expressions are
very similar to the RelBuilder structure, (e.g., .groupBy(), .orderBy(),
etc). On the other hand, the extensive use of functions (e.g., selectors)
and in-memory data-types may be difficult to implement as a RelBuilder. The
use case I have in mind is to have Linq4j as another top-level interface
(that is interpretable to relational algebra) to Calcite in addition to SQL
and Piglet.

Thanks,
Walaa.


Re: Recursive query, graph query, Datalog

2017-12-21 Thread Walaa Eldin Moustafa
Agreed on the high level description of the iterate operator and table
function. The table function is basically a "combiner" that will combine
the delta with the result of past iteration somehow.

I would say we need a UINION (versus UNION ALL) operator since we do not
want to re-add facts that were already inferred in past iteration (in case
they are re-inferred).

Are you aware of anyone working on the parser/AST? I am giving them a try
in case someone wants to collaborate.

Thanks,
Walaa.


On Mon, Dec 18, 2017 at 12:02 AM, Julian Hyde <jh...@apache.org> wrote:

> Yes, I agree.
>
> My first instinct is to add an Iterate operator whose arguments are (1)
> the input, (2) a table function that applies to the delta at each
> iteration. When the table function returns the empty set, iteration stops.
> The “table function” is not a function per se, but a RelNode tree that
> references a pseudo-table called “Delta”. Think of it as a relational
> lambda expression, and the “Delta" is the argument.
>
> Intermediate results are combined using UNION ALL. Is this too
> restrictive? I think maybe not, because you can always add a “finalizer”
> such as an Aggregate after the Iterate operator.
>
> Julian
>
>
> > On Dec 15, 2017, at 3:11 PM, Walaa Eldin Moustafa <wa.moust...@gmail.com>
> wrote:
> >
> > Yes, Magic sets is a very important and popular optimization as well. I
> > guess once we can get a basic notion of recursion as a construct in
> > Calcite, and get it to work correctly, we can start cracking
> optimizations.
> > One thing to note is that the convergence/fixed point depend on the data,
> > and there is no way to know beforehand what the (complete) plan will look
> > like (i.e., how many joins). It seems that there must be some sort of
> > awareness in the host engine of the fact that the query is recursive, and
> > it should keep iterating till fixed point, or at least tell Calcite if it
> > converged or not, and if not, Calcite will ask it to keep trying, so
> every
> > iteration Calcite sends a traditional (non-recursive) RA plan, or ask the
> > engine to stop. Do you agree?
> >
> >
> > On Fri, Dec 15, 2017 at 12:06 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> (Moving Carl, Shrikanth, Vasanth to bcc.)
> >>
> >> Regarding optimizations. One one hand, it is daunting that there so
> >> many optimizations are required to make graph queries run efficiently,
> >> but on the other hand, it is good news for the project if those can be
> >> expressed in relational algebra.
> >>
> >> Looking at the previous research, some of the optimizations applied
> >> are genuinely only possible at run-time, but others should be thought
> >> of as logical rewrites. Semi-naive evaluation, which Walaa mentions,
> >> can be expressed as a logical operation (very similar to incremental
> >> view maintenance and streaming, by the way).
> >>
> >> (Untangling the capabilities of a particular engine from algebraic
> >> rewrites is Calcite's gift to the world!)
> >>
> >> Another very important logical rewrite is "magic sets"[1], which can
> >> be modeled as semi-join push-down and done entirely at planning
> >> time[2] or (if the runtime supports it) as side-ways information
> >> passing of bloom filters or similar. Magic sets are important for
> >> graph queries but also very useful for star-schema queries with a
> >> fixed number of joins.
> >>
> >> Julian
> >>
> >> [1] http://db.cs.berkeley.edu/papers/sigmod96-magic.pdf
> >>
> >> [2] https://issues.apache.org/jira/browse/CALCITE-468
> >>
> >>
> >> On Fri, Dec 15, 2017 at 11:21 AM, Edmon Begoli <ebeg...@gmail.com>
> wrote:
> >>> Great initiative.
> >>>
> >>> I will also share some comparative performance studies we did at ORNL
> on
> >>> different graph processing engines. Could be useful.
> >>>
> >>> On Fri, Dec 15, 2017 at 14:11 Walaa Eldin Moustafa <
> >> wa.moust...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi Julian,
> >>>>
> >>>> Thanks for referencing our Datalog query processing paper [5]. I have
> >> been
> >>>> thinking about the same idea for a while now too :) I think Calcite is
> >> very
> >>>> well positioned as a generic query optimizer to add Datalog/recursive
> >> query
> >>>> support. Also, it makes a lot of sense since it opens a completely new
&

Re: Recursive query, graph query, Datalog

2017-12-15 Thread Walaa Eldin Moustafa
Hi Julian,

Thanks for referencing our Datalog query processing paper [5]. I have been
thinking about the same idea for a while now too :) I think Calcite is very
well positioned as a generic query optimizer to add Datalog/recursive query
support. Also, it makes a lot of sense since it opens a completely new
dimension for the kind of logic and queries that Calcite can handle,
including but not limited to graph queries, and that can be immediately
available to engines talking to Calcite.

To answer your questions, we probably need to add a transitive closure
operator. This 1988 paper  by
Rakesh Agrawal proposes the notion of alpha relations, and defines an alpha
operator on top of them which computes the transitive closure of alpha
relations. The operator fits well with the rest of Cod's relational algebra
operators.

For query optimizations, one of the commonly used Datalog optimizations is
Semi-naive evaluation, where instead of re-evaluating the recursive program
using all existing facts, only new facts inferred since last iteration are
used. Datalog optimizations become much more interesting when introducing
aggregation and negation, and it is still an open research question, but
there is already some tangible progress. Otherwise, as you mentioned
transitive closure is repeated joins, so pretty much many of the join
optimizations apply such as predicate pushdown, and aggregation
pushdown/pull up.

Regarding the effort, we can always start from basic features and expand
from there. I have already started working on the parser, AST and logical
plan builder for basic Datalog without recursion. I am happy to create a
JIRA ticket to track this effort there.

Thanks,
Walaa.


On Fri, Dec 15, 2017 at 10:26 AM, Julian Hyde  wrote:

> I've been thinking about Datalog front end to Calcite. How much effort
> would it be? Would there be an audience who would find it useful?
>
> The genesis of the idea is talks by Frank McSherry[1] and Vasia
> Kalavri[2] about graph queries and in particular Timely
> Dataflow[3][4], and a paper about using Datalog for graph processing
> [5].
>
> A few observations:
> * Graph queries require repeated (unbounded) joins, and so are beyond
> SQL:92.
> * Datalog allows recursion, and can therefore compute things like
> transitive closure, and is therefore powerful enough for graph
> queries.
> * SQL:99 added 'WITH RECURSIVE' so can handle a pretty useful class of
> queries. However, for a variety of reasons, people tend not to use SQL
> for querying graph databases.
> * Datalog is more than just recursive queries. For instance, it is
> popular with academics analyzing the power/complexity of languages.
> And it can do deductive queries.
> * There are different "strengths" of Datalog. Some variants support
> only linearizable recursion; most variants only support queries whose
> results are stratified (i.e. can be defined using well-founded
> induction, necessary when negations are involved).
> * The "big data" languages (Hadoop, Spark, Flink, even Pig) have all
> discussed how they can handle graph queries and iterative computation.
> However they have mainly focused on improvements to their engine and
> physical algebra, not looked at logical algebra or query language.
> * If Calcite's algebra could express deductive query / recursive query
> / iteration, then not only would Datalog be possible, but also SQL's
> WITH RECURSIVE, and also SPARQL.
>
> So, questions on my mind:
> * What new operator(s) would we add to Calcite's algebra to enable
> recursive query?
> * What optimization rules are possible/necessary for graph queries?
> * How much effort would it be to add a Datalog parser to Calcite and
> translate to Calcite's algebra?
>
> Julian
>
> [1] http://www.dataengconf.com/scalability-but-at-what-cost
>
> [2] https://qconsf.com/sf2017/speakers/vasia-kalavri
>
> [3] https://github.com/frankmcsherry/timely-dataflow
>
> [4] http://sigops.org/sosp/sosp13/papers/p439-murray.pdf
>
> [5] http://www.sysnet.ucsd.edu/sysnet/miscpapers/datalog-icbd16.pdf
>