Re: Long-term thoughts about big-data queries in SIS

Martin Desruisseaux Wed, 11 Nov 2015 05:23:59 -0800

Le 10/11/15 19:44, Marc Le Bihan a écrit :
>    3) Parsing of statements is now the main difficulty I have, and
> this subject started a debate few months ago : if I continue clause by
> clause (attempting to detect a GROUP BY, a HAVING, a LIKE ...
> "manually" it will be long and difficult.
>    If I use a parser like AntLR, it will be potent and complete, but
> this API is known to be really hard to handle and to make working
> perfectly. I used it four times, but I still fear each time I'm using
> it. But I think that it's the only solution.


Could http://calcite.apache.org/ free us from this task? If I understand
correctly, we would just need to implement some methods that are
automatically invoked by Calcite. So we would have no SQL parser to
write at all and no JDBC interface to implement ourself. According their
documentation, Calcite already implements SELECT, FROM (including JOIN),
WHERE, GROUP BY (including GROUPING SETS), COUNT(DISTINCT …), FILTER,
HAVING, ORDER BY (including NULLS FIRST/LAST), UNION, INTERSECT, MINUS,
sub-queries and more.

However one open question is whether it is easy or hard to add our own
SQL instructions to Calcite, since we will need to provide geometry
functions. I do not know the answer to that question at this time.

Calcite provides an example using CSV file as a database. We would copy
this example and replace the code reading from CSV file by code reading
from Shapefile.

We could also go when step further and try to use
http://drill.apache.org/ instead of Calcite, in anticipation for
big-data. However since Drill uses Calcite under the hood, it is
probably fine to start with Calcite for now since it would not introduce
any additional dependency compared to Drill.

    Martin

Re: Long-term thoughts about big-data queries in SIS

Reply via email to