Re: Using Calcite with Python

Gavin Ray Mon, 31 Jan 2022 17:36:55 -0800

I have nothing of value to add, but:

> [5] https://github.com/oap-project/gazelle-jni/tree/velox_dev


Hot damn this is neat

On Mon, Jan 31, 2022 at 7:58 PM Jacques Nadeau <jacq...@apache.org> wrote:

> A couple of related (possibly useful?) pointers here:
>
>    - Dask-sql [1] uses Calcite in a python context. Might be some good
>    stuff to leverage there.
>    - I'm working on compiling Calcite as a GraalVM shared native library
>    [2] as part of Substrait [3] with the goal of ultimately having a
> friendly
>    C binding [4] for use in non-jvm worlds. This connects to work being
> done
>    by others to support tools like Arrow and Velox [5] as Substrait targets
>    (and thus completing the path from c interface to native execution via
>    Calcite).
>
>
> [1] https://github.com/dask-contrib/dask-sql
> [2] https://issues.apache.org/jira/browse/CALCITE-4786
> [3] https://github.com/substrait-io/substrait/pull/120
> [4] https://github.com/jacques-n/substrait/pull/3
> [5] https://github.com/oap-project/gazelle-jni/tree/velox_dev
>
> On Mon, Jan 31, 2022 at 3:32 PM Nicola Vitucci <nicola.vitu...@gmail.com>
> wrote:
>
> > Hi Eugen, Michael, Gavin,
> >
> > Thank you very much for your input. Answering to your suggestions:
> >
> > - Phoenix client: I saw it but decided not to use it because it does not
> > seem very active and up to date (its Avatica version is 1.10, while
> latest
> > is 1.20). I may still give it a try though.
> > - Arrow Flight: I think it can be very useful especially, like Michael
> > mentioned, if it were integrated with Avatica as a transport; at the
> > moment, though, it is not.
> >
> > I am basically looking for a (relatively) easy and ready to implement,
> easy
> > to keep up to date, and reasonably performant solution. Although it
> incurs
> > some overhead, a solution based on Python + Java seems to me the most
> > reasonable for the time being. Do you have any other suggestions or
> > recommendations?
> >
> > Thanks again,
> >
> > Nicola
> >
> >
> >
> > Il giorno lun 31 gen 2022 alle ore 17:04 Michael Mior <mm...@apache.org>
> > ha
> > scritto:
> >
> > > Flight is definitely another consideration for the future. Personally I
> > > think it would be most interesting to integrate Flight with Avatica as
> an
> > > alternative transport. But it would certainly also be useful to allow
> the
> > > Arrow adapter to connect to any Flight endpoint.
> > >
> > > --
> > > Michael Mior
> > > mm...@apache.org
> > >
> > >
> > > Le lun. 31 janv. 2022 à 10:00, Gavin Ray <ray.gavi...@gmail.com> a
> > écrit :
> > >
> > > > This is really interesting stuff you've done in the example notebooks
> > > >
> > > > Nicola & Michael, I wonder if you could benefit from the
> > > recently-released
> > > > Arrow Flight SQL?
> > > >
> > > >
> > >
> >
> https://www.dremio.com/subsurface/arrow-flight-and-arrow-flight-sql-accelerating-data-movement/
> > > >
> > > > I have asked Jacques about this a bit -- it's meant to be a
> > > standardization
> > > > for communicating SQL queries and metadata with Arrow.
> > > > I'm not intimately familiar with it, but it seems like it could be a
> > good
> > > > base to build a Calcite backend for Arrow from?
> > > >
> > > > They have a pretty thorough Java example in the repository:
> > > >
> > > >
> > >
> >
> https://github.com/apache/arrow/blob/968e6ea488c939c0e1f2bfe339a5a9ed1aed603e/java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/example/FlightSqlExample.java#L169-L180
> > > >
> > > > On Mon, Jan 31, 2022 at 8:47 AM Michael Mior <mm...@apache.org>
> wrote:
> > > >
> > > > > You may want to keep an eye on CALCITE-2040 (
> > > > > https://issues.apache.org/jira/browse/CALCITE-2040). I have a
> > student
> > > > who
> > > > > is working on a Calcite adapter for Apache Arrow. We're basically
> > hung
> > > up
> > > > > waiting on the Arrow team to release a compatible JAR. This still
> > won't
> > > > > fully solve your problem though as the first version of the adapter
> > is
> > > > only
> > > > > capable of reading from Arrow files. However, the goal is
> eventually
> > to
> > > > > allow passing a memory reference into the adapter so that it would
> be
> > > > > possible to make use of Arrow data which is constructed in-memory
> > > > > elsewhere.
> > > > > --
> > > > > Michael Mior
> > > > > mm...@apache.org
> > > > >
> > > > >
> > > > > Le dim. 30 janv. 2022 à 17:36, Nicola Vitucci <
> > > nicola.vitu...@gmail.com>
> > > > a
> > > > > écrit :
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > What would be the best way to use Calcite with Python? I've come
> up
> > > > with
> > > > > > two potential solutions:
> > > > > >
> > > > > > - using the jaydebeapi package, to connect via the JDBC driver
> > > directly
> > > > > > from a JVM created via jpype;
> > > > > > - using Apache Arrow via the pyarrow package, to connect in
> > basically
> > > > the
> > > > > > same way but creating Arrow objects with JdbcToArrowUtils (and
> > > > optionally
> > > > > > converting them to Pandas).
> > > > > >
> > > > > > Although the former is more straightforward, the latter allows to
> > > > achieve
> > > > > > better performance (see [1] for instance) since it's exactly what
> > > Arrow
> > > > > is
> > > > > > for. I've created two Jupyter notebooks [2] showing each
> solution.
> > > What
> > > > > > would you recommend? Is there an even better approach?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Nicola
> > > > > >
> > > > > > [1] https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html
> > > > > > [2]
> > > > >
> > https://github.com/nvitucci/calcite-sparql/tree/v0.0.2/examples/python
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Using Calcite with Python

Reply via email to