It may be only tangentially related but as the Rust implementation works on
arrow Flight (e.g. [1]) we are also working to make the API easier to work
with. In the Rust case, however, we are currently working to hide some of
the lower level gRPC details, which may not be so cumbersome in other
language implementations.

Andrew

[1] https://github.com/apache/arrow-rs/pull/1386

On Sun, Mar 13, 2022 at 7:15 PM Gavin Ray <ray.gavi...@gmail.com> wrote:

> FWIW, I filed an RFC issue here, along with a prototype implementation and
> sample usage + console output code:
>
> https://github.com/apache/arrow/issues/12618
>
> On Sun, Mar 13, 2022 at 10:43 AM Gavin Ray <ray.gavi...@gmail.com> wrote:
>
> > Generally, the preferred pattern is one VectorSchemaRoot that
> >> gets reloaded each time.  So an API like "df.loadVectorSchemaRoot(root)"
> >> probably makes more sense but we can iterate on this.
> >>
> >
> > Could you expand on what exactly you mean by this?
> >
> > Still a bit blurry on the best-practices behind sending
> > the Arrow response in Flight and seems like an important point.
> >
> >
> > ... creating a new contrib module that maps
> >> from java objects (just like there are JDBC and Avro ones) seems
> >> worthwhile.  If you are interested in contributing something like this I
> >> think a short design doc would be worth-while.
> >>
> >
> > Where would be the best place to post this?
> >
> > I was thinking about GitHub issues but I am GitHub-centric,
> > not sure if JIRA or mailing list would be better.
> >
> > Thanks, Micah!
> >
> >
> > On Sun, Mar 13, 2022 at 12:46 AM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> >
> >> Hi Gavin,
> >>
> >> > Just curious whether there is any interest/intention of possibly
> making
> >> a
> >> > higher level API around the basic FlightSQL one?
> >>
> >>
> >> IIUC, I don't think this is an issue with Flight but one with generic
> >> conversion between data into Arrow.  I don't think anyone is actively
> >> working on something like this, but creating a new contrib module that
> >> maps
> >> from java objects (just like there are JDBC and Avro ones) seems
> >> worthwhile.  If you are interested in contributing something like this I
> >> think a short design doc would be worth-while.
> >>
> >> VectorSchemaRoot root = df.toVectorSchemaRoot();
> >> > listener.setVectorSchemaRoot(root);
> >> > listener.sendVectorSchemaRootContents();
> >>
> >>
> >> A small nit.  Generally, the preferred pattern is one VectorSchemaRoot
> >> that
> >> gets reloaded each time.  So an API like "df.loadVectorSchemaRoot(root)"
> >> probably makes more sense but we can iterate on this.  This wasn't
> >> commonly
> >> understood when some of the other contrib modules were developed.
> >>
> >> Cheers,
> >> Micah
> >>
> >>
> >> On Sat, Mar 12, 2022 at 12:15 PM Gavin Ray <ray.gavi...@gmail.com>
> wrote:
> >>
> >> > While trying to implement and introduce the idea of adopting
> FlightSQL,
> >> the
> >> > largest challenge was the API itself
> >> >
> >> > I know it's meant to be low-level. But I found that most of the
> >> development
> >> > time was in code to convert to/from
> >> > row-based data (IE Map<String, Object>) and Java types, and columnar
> >> data +
> >> > Arrow types.
> >> >
> >> > I'm likely in the minority position here -- I know that Arrow and
> >> FlightSQL
> >> > users are largely looking at transferring large volumes of data and
> >> > servicing OLAP-type workloads
> >> > But the thing that excites me most about FlightSQL, isn't its
> >> performance
> >> > (always nice to have), but that it's a language-agnostic standard for
> >> data
> >> > access.
> >> >
> >> > That has broad implications -- for all kinds of data-access workloads
> >> and
> >> > business usecases.
> >> >
> >> > The challenge is that in trying to advocate for it, when presenting a
> >> > proof-of-concept,
> >> > rather than what a developer might expect to see, something like:
> >> >
> >> > // FlightSQL handler code
> >> > List<Map<String, Object>> results = ....;
> >> > results.add(Map.of("id", 1, "name", "Person 1");
> >> > return results;
> >> >
> >> > A significant portion of the code is in Arrow-specific implementation
> >> > details:
> >> > creating a VectorSchemaRoot, FieldVector, de-serializing the results
> on
> >> the
> >> > client, etc.
> >> >
> >> > Just curious whether there is any interest/intention of possibly
> making
> >> a
> >> > higher level API around the basic FlightSQL one?
> >> > Maybe something closer to the traditional notion of a row-based
> >> "DataFrame"
> >> > or "Table", like:
> >> >
> >> > DataFrame df = new DataFrame();
> >> > df.addColumn("id", ArrowTypes.Int);
> >> > df.addColumn("name", ArrowTypes.VarChar);
> >> > df.addRow(Map.of("id", 1, "name", "Person 1"));
> >> > VectorSchemaRoot root = df.toVectorSchemaRoot();
> >> > listener.setVectorSchemaRoot(root);
> >> > listener.sendVectorSchemaRootContents();
> >> >
> >>
> >
>

Reply via email to