Hi Julian, Thanks! This is the kind of suggestion I was looking for.
I did, in fact take a look at Avatica: Drill uses it for the existing JDBC driver. To be honest, I was a bit concerned about the overhead of converting rows to/from JSON. Have you looked at fitting a binary protocol under Avatica? Would sure be great to reuse the work already done to handle the many JDBC complexities. - Paul > On Jul 20, 2016, at 1:39 PM, Julian Hyde <jh...@apache.org> wrote: > > Did you consider Avatica? Identical goals, it works already, and there > are clients in several languages. > > Julian > > > On Wed, Jul 20, 2016 at 10:35 AM, Chunhui Shi <c...@maprtech.com> wrote: >> Cool. And we know that there are already many 'light weight' APIs soon >> become the main stream APIs. >> >> On Tue, Jul 19, 2016 at 10:56 PM, Paul Rogers <prog...@maprtech.com> wrote: >> >>> Hi All, >>> >>> As I’ve been playing with and learning about Drill, it struck me that >>> Drill is a wonderful “industrial strength” query engine, but that the >>> client API is a bit complex if all an app wants to do is execute a few >>> queries. I wondered if we need an adapter between the full-blown Drill >>> columnar, asynchronous RPC that Drill uses internally, and the row-based, >>> synchronous API that most apps know and love. >>> >>> In thinking about a simpler client API, a few items came to mind: >>> >>> - We have the JDBC API for Java apps, but the internals of the current >>> JDBC use the Drill client and so the JDBC jar is quite big (20MB). >>> >>> - The current client API is not versioned, requiring clients to be >>> upgraded in lock-step with servers. Many admins, however, find it necessary >>> to upgrade clients on a schedule different from that of the server. >>> (Imagine upgrading dozens of desktop users at the same time as the Drill >>> cluster.) Many of the traditional DB products version their interferes to >>> simplify this task. >>> >>> - A cool feature of Drill is schema-on-read, which means Drill may >>> encounter different schemas as data is read. At present, it is a bit hard >>> for clients to consume different schemas. It turns out, however, that >>> stored procedures provide something similar (multiple result sets) that we >>> could leverage that idea to make schema changes into a first-class feature >>> of the API. >>> >>> Playing around a bit in my spare time, I found that we can grab lots of >>> ideas from “traditional” DB APIs to solve the above problems (and more): >>> >>> - A simplified client API provides a row-based view of results, with >>> schema changes as a first-class API concept. >>> - A “direct" version of the client can sit directly on top of the Drill >>> Client, much like the current JDBC driver. >>> - Because the client API is simple, it is easy to create a new wire >>> protocol to carry the required row-based client messages. >>> - That wire protocol enables a very light-weight remote version of the >>> client API. >>> - A new server implements the server-side of the new wire protocol. The >>> server is an adapter: it converts the “retail” row-based API into the >>> “wholesale” columnar API of Drill. >>> - A new JDBC implementation uses the remote API instead of directly using >>> the Drill Client API. >>> >>> Because the remote client has no dependencies on Drill (or, indeed, >>> anything other than the JDK), it is very small. Indeed, the revised JDBC >>> jar is about 1% of the size of the existing JDBC driver. (200KB instead of >>> 20MB.) >>> >>> The result is a little prototype project called “Jig”. I’d like to toss it >>> out to the community to see if this is something of interest to others. The >>> code works just well enough to prove the concept, though I’ve left off the >>> more “advanced” data types, multiple cursors per connection, and other >>> details. >>> >>> The advantage for Java users is a simpler API, smaller JDBC driver, fewer >>> dependencies and cross-version compatibility. >>> >>> If we add clients in other languages, then just about any language can >>> easily query Drill without a Java or ODBC bridge. This would be handy for >>> that Caravel integration project discussed here a month or so back. Also >>> for data scientists who prefer Python or R. >>> >>> In case there is interest in this idea, a more detailed proposal is >>> available: >>> https://docs.google.com/document/d/1TpJOEUO-DBDGIidOML2_InpJ-fK4yHmsbV5ncqXT6pM >>> >>> The code is in a GitHub repo: https://github.com/paul-rogers/drill-jig >>> >>> The JIRA for this enhancement: DRILL-4791: >>> https://issues.apache.org/jira/browse/DRILL-4791 >>> >>> This has been a great little learning exercise. Is this something that >>> might we might want to take further? Thoughts on the approach taken? >>> >>> Thanks, >>> >>> - Paul >>> >>> >>>