Sounds like a very interesting issue. While I’m evaluating Calcite for JDBC adaptor over postgreSQL with TPC-DS queries, where Calcite queries 2~10 times slower than native postgresql queries through psql. So, including JDBC latency issues, overall enhancement of Avatica would be beneficial to Calcite. Perhaps, query processing itself can be an issue for this case, according to the following comments on JDBC adaptor from Calcite’s tutorial page (https://calcite.apache.org/docs/tutorial.html):
Current limitations: The JDBC adapter currently only pushes down table scan operations; all other processing (filtering, joins, aggregations and so forth) occurs within Calcite. Our goal is to push down as much processing as possible to the source system, translating syntax, data types and built-in functions as we go. If a Calcite query is based on tables from a single JDBC database, in principle the whole query should go to that database. If tables are from multiple JDBC sources, or a mixture of JDBC and non-JDBC, Calcite will use the most efficient distributed query approach that it can. Thank you, Seung-Hwan On Aug 23, 2018, at 3:45 PM, Julian Hyde <jh...@apache.org<mailto:jh...@apache.org>> wrote: This is a paper in VLDB 2018, "Don’t Hold My Data Hostage – A Case For Client Protocol Redesign” by Mark Rassveldt and Hannes Muhleisen[1]. It claims that database client protocols (inside ODBC and JDBC drivers) are very inefficient, and has a compelling example where commercial drivers are 10x to 68x slower than net-cat. One of the goals of Avatica is to do better. How are we doing? Are there any ideas in the paper we could adopt? Would a closer partnership with Apache Arrow help us achieve those goals? Julian [1] https://hannes.muehleisen.org/p852-muehleisen.pdf <https://hannes.muehleisen.org/p852-muehleisen.pdf>