Hey Brendan,

As Jacques promised here are a few things to act as pointers for your work
on Flight:
Our early release Flight connector[1]  this fully supports single flight
streams and partially supports parallel streams
I also have a Spark DataSourceV2 client which may be of interest to you[2]

Both links make use of the 'doAction' part of the Flight API spec[3] to
negotiate parallel vs single stream among other things. However, this is
done in an ad-hoc manner and finding a way to standardise this for exchange
of metadata, catalog info, connection parameters etc is for me an important
next step to making a flight based protocol that is equivalent to
odbc/jdbc. I would be happy to discuss further if you have any thoughts on
the topic.

Best,
Ryan

[1] https://github.com/dremio-hub/dremio-flight-connector
[2] https://github.com/rymurr/flight-spark-source
[3] https://github.com/apache/arrow/blob/master/format/Flight.proto

On Thu, May 21, 2020 at 3:08 PM Uwe L. Korn <uw...@xhochy.com> wrote:

> Hello Brendan,
>
> welcome to the community. In addition to the folks at Dremio, I wanted to
> make you aware of the Python ODBC client library
> https://github.com/blue-yonder/turbodbc which provides a high-performance
> ODBC<->Arrow adapter. It is especially popular with MS SQL Server users as
> the fastest known way to retrieve query results as DataFrames in Python
> from SQL Server, considerably faster than pandas.read_sql or using pyodbc
> directly.
>
> While being the fastest known, I can tell that still there is a lot time
> CPU spent in the ODBC driver "transforming" results so that it matches the
> ODBC interface. At least here, one could get possibly a lot better
> performance when retrieving large columnar results from SQL Server when
> going through Arrow Flight as an interface instead being constraint to the
> less efficient ODBC for this use case. Currently there is a performance
> difference of 50x between reading the data from a Parquet file and reading
> the same data from a table in SQL Server (simple SELECT, no filtering or
> so). As nearly for the full retrieval time the client CPU is at 100%, using
> a more efficient protocol for data transferral could roughly translate into
> a 10x speedup.
>
> Best,
> Uwe
>
> On Wed, May 20, 2020, at 12:16 AM, Brendan Niebruegge wrote:
> > Hi everyone,
> >
> > I wanted to informally introduce myself. My name is Brendan Niebruegge,
> > I'm a Software Engineer in our SQL Server extensibility team here at
> > Microsoft. I am leading an effort to explore how we could integrate
> > Arrow Flight with SQL Server. We think this could be a very interesting
> > integration that would both benefit SQL Server and the Arrow community.
> > We are very early in our thoughts so I thought it best to reach out
> > here and see if you had any thoughts or suggestions for me. What would
> > be the best way to socialize my thoughts to date? I am keen to learn
> > and deepen my knowledge of Arrow as well so please let me know how I
> > can be of help to the community.
> >
> > Please feel free to reach out anytime (email:brn...@microsoft.com)
> >
> > Thanks,
> > Brendan Niebruegge
> >
> >
>

Reply via email to