Re: Long-term thoughts about big-data queries in SIS

Marc Le Bihan Wed, 11 Nov 2015 20:10:01 -0800

Calcite seems impressive, yes !
Its worth the study.

-----Message d'origine-----From: Martin Desruisseaux

Sent: Wednesday, November 11, 2015 2:22 PM
To: [email protected]
Subject: Re: Long-term thoughts about big-data queries in SIS


Le 10/11/15 19:44, Marc Le Bihan a écrit :

   3) Parsing of statements is now the main difficulty I have, and
this subject started a debate few months ago : if I continue clause by
clause (attempting to detect a GROUP BY, a HAVING, a LIKE ...
"manually" it will be long and difficult.
   If I use a parser like AntLR, it will be potent and complete, but
this API is known to be really hard to handle and to make working
perfectly. I used it four times, but I still fear each time I'm using
it. But I think that it's the only solution.


Could http://calcite.apache.org/ free us from this task? If I understand
correctly, we would just need to implement some methods that are
automatically invoked by Calcite. So we would have no SQL parser to
write at all and no JDBC interface to implement ourself. According their
documentation, Calcite already implements SELECT, FROM (including JOIN),
WHERE, GROUP BY (including GROUPING SETS), COUNT(DISTINCT …), FILTER,
HAVING, ORDER BY (including NULLS FIRST/LAST), UNION, INTERSECT, MINUS,
sub-queries and more.

However one open question is whether it is easy or hard to add our own
SQL instructions to Calcite, since we will need to provide geometry
functions. I do not know the answer to that question at this time.

Calcite provides an example using CSV file as a database. We would copy
this example and replace the code reading from CSV file by code reading
from Shapefile.

We could also go when step further and try to use
http://drill.apache.org/ instead of Calcite, in anticipation for
big-data. However since Drill uses Calcite under the hood, it is
probably fine to start with Calcite for now since it would not introduce
any additional dependency compared to Drill.

   Martin

Re: Long-term thoughts about big-data queries in SIS

Reply via email to