Cool. And we know that there are already many 'light weight' APIs soon
become the main stream APIs.

On Tue, Jul 19, 2016 at 10:56 PM, Paul Rogers <prog...@maprtech.com> wrote:

> Hi All,
>
> As I’ve been playing with and learning about Drill, it struck me that
> Drill is a wonderful “industrial strength” query engine, but that the
> client API is a bit complex if all an app wants to do is execute a few
> queries. I wondered if we need an adapter between the full-blown Drill
> columnar, asynchronous RPC that Drill uses internally, and the row-based,
> synchronous API that most apps know and love.
>
> In thinking about a simpler client API, a few items came to mind:
>
> - We have the JDBC API for Java apps, but the internals of the current
> JDBC use the Drill client and so the JDBC jar is quite big (20MB).
>
> - The current client API is not versioned, requiring clients to be
> upgraded in lock-step with servers. Many admins, however, find it necessary
> to upgrade clients on a schedule different from that of the server.
> (Imagine upgrading dozens of desktop users at the same time as the Drill
> cluster.) Many of the traditional DB products version their interferes to
> simplify this task.
>
> - A cool feature of Drill is schema-on-read, which means Drill may
> encounter different schemas as data is read. At present, it is a bit hard
> for clients to consume different schemas. It turns out, however, that
> stored procedures provide something similar (multiple result sets) that we
> could leverage that idea to make schema changes into a first-class feature
> of the API.
>
> Playing around a bit in my spare time, I found that we can grab lots of
> ideas from “traditional” DB APIs to solve the above problems (and more):
>
> - A simplified client API provides a row-based view of results, with
> schema changes as a first-class API concept.
> - A “direct" version of the client can sit directly on top of the Drill
> Client, much like the current JDBC driver.
> - Because the client API is simple, it is easy to create a new wire
> protocol to carry the required row-based client messages.
> - That wire protocol enables a very light-weight remote version of the
> client API.
> - A new server implements the server-side of the new wire protocol. The
> server is an adapter: it converts the “retail” row-based API into the
> “wholesale” columnar API of Drill.
> - A new JDBC implementation uses the remote API instead of directly using
> the Drill Client API.
>
> Because the remote client has no dependencies on Drill (or, indeed,
> anything other than the JDK), it is very small.  Indeed, the revised JDBC
> jar is about 1% of the size of the existing JDBC driver. (200KB instead of
> 20MB.)
>
> The result is a little prototype project called “Jig”. I’d like to toss it
> out to the community to see if this is something of interest to others. The
> code works just well enough to prove the concept, though I’ve left off the
> more “advanced” data types, multiple cursors per connection, and other
> details.
>
> The advantage for Java users is a simpler API, smaller JDBC driver, fewer
> dependencies and cross-version compatibility.
>
> If we add clients in other languages, then just about any language can
> easily query Drill without a Java or ODBC bridge. This would be handy for
> that Caravel integration project discussed here a month or so back. Also
> for data scientists who prefer Python or R.
>
> In case there is interest in this idea, a more detailed proposal is
> available:
> https://docs.google.com/document/d/1TpJOEUO-DBDGIidOML2_InpJ-fK4yHmsbV5ncqXT6pM
>
> The code is in a GitHub repo: https://github.com/paul-rogers/drill-jig
>
> The JIRA for this enhancement: DRILL-4791:
> https://issues.apache.org/jira/browse/DRILL-4791
>
> This has been a great little learning exercise. Is this something that
> might we might want to take further? Thoughts on the approach taken?
>
> Thanks,
>
> - Paul
>
>
>

Reply via email to