Re: [DISCUSS] Avatica - how efficient is our protocol?

2019-10-15 Thread Michael Mior
I thought I would resurrect this thread given the announcement of
Flight (link below). I'm not too familiar with Avatica, but it seems
like Flight (essentially a client-server framework for transporting
Arrow data) could be a good fit.

https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/
--
Michael Mior
mm...@apache.org

Le jeu. 23 août 2018 à 15:45, Julian Hyde  a écrit :
>
> This is a paper in VLDB 2018, "Don’t Hold My Data Hostage – A Case For Client 
> Protocol Redesign” by Mark Rassveldt and Hannes Muhleisen[1]. It claims that 
> database client protocols (inside ODBC and JDBC drivers) are very 
> inefficient, and has a compelling example where commercial drivers are 10x to 
> 68x slower than net-cat.
>
> One of the goals of Avatica is to do better. How are we doing? Are there any 
> ideas in the paper we could adopt? Would a closer partnership with Apache 
> Arrow help us achieve those goals?
>
> Julian
>
> [1] https://hannes.muehleisen.org/p852-muehleisen.pdf 
> 


Re: [DISCUSS] Avatica - how efficient is our protocol?

2018-08-23 Thread Josh Elser
I remember when I was doing some old benchmarks, I had to build some 
custom logic to get writes happening at a decent speed (some basic 
batching and avoiding deserializing things when I could). I'm sure there 
is much more we can do here, especially around our HTTP transport.


My impression is that Arrow can support the same kind of protocol 
evolution that protobuf also allows (that's the big reason we chose it 
the first time).


On 8/23/18 3:45 PM, Julian Hyde wrote:

This is a paper in VLDB 2018, "Don’t Hold My Data Hostage – A Case For Client 
Protocol Redesign” by Mark Rassveldt and Hannes Muhleisen[1]. It claims that 
database client protocols (inside ODBC and JDBC drivers) are very inefficient, and 
has a compelling example where commercial drivers are 10x to 68x slower than net-cat.

One of the goals of Avatica is to do better. How are we doing? Are there any 
ideas in the paper we could adopt? Would a closer partnership with Apache Arrow 
help us achieve those goals?

Julian

[1] https://hannes.muehleisen.org/p852-muehleisen.pdf 




Re: [DISCUSS] Avatica - how efficient is our protocol?

2018-08-23 Thread Lim, Seung-Hwan
Sounds like a very interesting issue.

While I’m evaluating Calcite for JDBC adaptor over postgreSQL with TPC-DS 
queries, where Calcite queries 2~10 times slower than native postgresql queries 
through psql.  So, including JDBC latency issues, overall enhancement of 
Avatica would be beneficial to Calcite. Perhaps, query processing itself can be 
an issue for this case, according to the following comments on JDBC adaptor 
from Calcite’s tutorial page (https://calcite.apache.org/docs/tutorial.html):

Current limitations: The JDBC adapter currently only pushes down table scan 
operations; all other processing (filtering, joins, aggregations and so forth) 
occurs within Calcite. Our goal is to push down as much processing as possible 
to the source system, translating syntax, data types and built-in functions as 
we go. If a Calcite query is based on tables from a single JDBC database, in 
principle the whole query should go to that database. If tables are from 
multiple JDBC sources, or a mixture of JDBC and non-JDBC, Calcite will use the 
most efficient distributed query approach that it can.

Thank you,
Seung-Hwan


On Aug 23, 2018, at 3:45 PM, Julian Hyde 
mailto:jh...@apache.org>> wrote:

This is a paper in VLDB 2018, "Don’t Hold My Data Hostage – A Case For Client 
Protocol Redesign” by Mark Rassveldt and Hannes Muhleisen[1]. It claims that 
database client protocols (inside ODBC and JDBC drivers) are very inefficient, 
and has a compelling example where commercial drivers are 10x to 68x slower 
than net-cat.

One of the goals of Avatica is to do better. How are we doing? Are there any 
ideas in the paper we could adopt? Would a closer partnership with Apache Arrow 
help us achieve those goals?

Julian

[1] https://hannes.muehleisen.org/p852-muehleisen.pdf 




[DISCUSS] Avatica - how efficient is our protocol?

2018-08-23 Thread Julian Hyde
This is a paper in VLDB 2018, "Don’t Hold My Data Hostage – A Case For Client 
Protocol Redesign” by Mark Rassveldt and Hannes Muhleisen[1]. It claims that 
database client protocols (inside ODBC and JDBC drivers) are very inefficient, 
and has a compelling example where commercial drivers are 10x to 68x slower 
than net-cat.

One of the goals of Avatica is to do better. How are we doing? Are there any 
ideas in the paper we could adopt? Would a closer partnership with Apache Arrow 
help us achieve those goals?

Julian

[1] https://hannes.muehleisen.org/p852-muehleisen.pdf