Re: A light-weight, versioned client API for Drill

Julian Hyde Wed, 20 Jul 2016 13:40:36 -0700

Did you consider Avatica? Identical goals, it works already, and there
are clients in several languages.


Julian


On Wed, Jul 20, 2016 at 10:35 AM, Chunhui Shi <[email protected]> wrote:
> Cool. And we know that there are already many 'light weight' APIs soon
> become the main stream APIs.
>
> On Tue, Jul 19, 2016 at 10:56 PM, Paul Rogers <[email protected]> wrote:
>
>> Hi All,
>>
>> As I’ve been playing with and learning about Drill, it struck me that
>> Drill is a wonderful “industrial strength” query engine, but that the
>> client API is a bit complex if all an app wants to do is execute a few
>> queries. I wondered if we need an adapter between the full-blown Drill
>> columnar, asynchronous RPC that Drill uses internally, and the row-based,
>> synchronous API that most apps know and love.
>>
>> In thinking about a simpler client API, a few items came to mind:
>>
>> - We have the JDBC API for Java apps, but the internals of the current
>> JDBC use the Drill client and so the JDBC jar is quite big (20MB).
>>
>> - The current client API is not versioned, requiring clients to be
>> upgraded in lock-step with servers. Many admins, however, find it necessary
>> to upgrade clients on a schedule different from that of the server.
>> (Imagine upgrading dozens of desktop users at the same time as the Drill
>> cluster.) Many of the traditional DB products version their interferes to
>> simplify this task.
>>
>> - A cool feature of Drill is schema-on-read, which means Drill may
>> encounter different schemas as data is read. At present, it is a bit hard
>> for clients to consume different schemas. It turns out, however, that
>> stored procedures provide something similar (multiple result sets) that we
>> could leverage that idea to make schema changes into a first-class feature
>> of the API.
>>
>> Playing around a bit in my spare time, I found that we can grab lots of
>> ideas from “traditional” DB APIs to solve the above problems (and more):
>>
>> - A simplified client API provides a row-based view of results, with
>> schema changes as a first-class API concept.
>> - A “direct" version of the client can sit directly on top of the Drill
>> Client, much like the current JDBC driver.
>> - Because the client API is simple, it is easy to create a new wire
>> protocol to carry the required row-based client messages.
>> - That wire protocol enables a very light-weight remote version of the
>> client API.
>> - A new server implements the server-side of the new wire protocol. The
>> server is an adapter: it converts the “retail” row-based API into the
>> “wholesale” columnar API of Drill.
>> - A new JDBC implementation uses the remote API instead of directly using
>> the Drill Client API.
>>
>> Because the remote client has no dependencies on Drill (or, indeed,
>> anything other than the JDK), it is very small.  Indeed, the revised JDBC
>> jar is about 1% of the size of the existing JDBC driver. (200KB instead of
>> 20MB.)
>>
>> The result is a little prototype project called “Jig”. I’d like to toss it
>> out to the community to see if this is something of interest to others. The
>> code works just well enough to prove the concept, though I’ve left off the
>> more “advanced” data types, multiple cursors per connection, and other
>> details.
>>
>> The advantage for Java users is a simpler API, smaller JDBC driver, fewer
>> dependencies and cross-version compatibility.
>>
>> If we add clients in other languages, then just about any language can
>> easily query Drill without a Java or ODBC bridge. This would be handy for
>> that Caravel integration project discussed here a month or so back. Also
>> for data scientists who prefer Python or R.
>>
>> In case there is interest in this idea, a more detailed proposal is
>> available:
>> https://docs.google.com/document/d/1TpJOEUO-DBDGIidOML2_InpJ-fK4yHmsbV5ncqXT6pM
>>
>> The code is in a GitHub repo: https://github.com/paul-rogers/drill-jig
>>
>> The JIRA for this enhancement: DRILL-4791:
>> https://issues.apache.org/jira/browse/DRILL-4791
>>
>> This has been a great little learning exercise. Is this something that
>> might we might want to take further? Thoughts on the approach taken?
>>
>> Thanks,
>>
>> - Paul
>>
>>
>>

Re: A light-weight, versioned client API for Drill

Reply via email to