Cool. And we know that there are already many 'light weight' APIs soon become the main stream APIs.
On Tue, Jul 19, 2016 at 10:56 PM, Paul Rogers <prog...@maprtech.com> wrote: > Hi All, > > As I’ve been playing with and learning about Drill, it struck me that > Drill is a wonderful “industrial strength” query engine, but that the > client API is a bit complex if all an app wants to do is execute a few > queries. I wondered if we need an adapter between the full-blown Drill > columnar, asynchronous RPC that Drill uses internally, and the row-based, > synchronous API that most apps know and love. > > In thinking about a simpler client API, a few items came to mind: > > - We have the JDBC API for Java apps, but the internals of the current > JDBC use the Drill client and so the JDBC jar is quite big (20MB). > > - The current client API is not versioned, requiring clients to be > upgraded in lock-step with servers. Many admins, however, find it necessary > to upgrade clients on a schedule different from that of the server. > (Imagine upgrading dozens of desktop users at the same time as the Drill > cluster.) Many of the traditional DB products version their interferes to > simplify this task. > > - A cool feature of Drill is schema-on-read, which means Drill may > encounter different schemas as data is read. At present, it is a bit hard > for clients to consume different schemas. It turns out, however, that > stored procedures provide something similar (multiple result sets) that we > could leverage that idea to make schema changes into a first-class feature > of the API. > > Playing around a bit in my spare time, I found that we can grab lots of > ideas from “traditional” DB APIs to solve the above problems (and more): > > - A simplified client API provides a row-based view of results, with > schema changes as a first-class API concept. > - A “direct" version of the client can sit directly on top of the Drill > Client, much like the current JDBC driver. > - Because the client API is simple, it is easy to create a new wire > protocol to carry the required row-based client messages. > - That wire protocol enables a very light-weight remote version of the > client API. > - A new server implements the server-side of the new wire protocol. The > server is an adapter: it converts the “retail” row-based API into the > “wholesale” columnar API of Drill. > - A new JDBC implementation uses the remote API instead of directly using > the Drill Client API. > > Because the remote client has no dependencies on Drill (or, indeed, > anything other than the JDK), it is very small. Indeed, the revised JDBC > jar is about 1% of the size of the existing JDBC driver. (200KB instead of > 20MB.) > > The result is a little prototype project called “Jig”. I’d like to toss it > out to the community to see if this is something of interest to others. The > code works just well enough to prove the concept, though I’ve left off the > more “advanced” data types, multiple cursors per connection, and other > details. > > The advantage for Java users is a simpler API, smaller JDBC driver, fewer > dependencies and cross-version compatibility. > > If we add clients in other languages, then just about any language can > easily query Drill without a Java or ODBC bridge. This would be handy for > that Caravel integration project discussed here a month or so back. Also > for data scientists who prefer Python or R. > > In case there is interest in this idea, a more detailed proposal is > available: > https://docs.google.com/document/d/1TpJOEUO-DBDGIidOML2_InpJ-fK4yHmsbV5ncqXT6pM > > The code is in a GitHub repo: https://github.com/paul-rogers/drill-jig > > The JIRA for this enhancement: DRILL-4791: > https://issues.apache.org/jira/browse/DRILL-4791 > > This has been a great little learning exercise. Is this something that > might we might want to take further? Thoughts on the approach taken? > > Thanks, > > - Paul > > >