Re: Drill 2.0 (design) hackathon

Muhammad Gelbana Wed, 06 Sep 2017 20:31:18 -0700

Understood. But if it's possible to stream the event, may be we can do the
streaming through YouTube too, which can archive the stream afterwards. But
it's up to 8 hours only.


I'm not a YouTube expert though.

https://support.google.com/youtube/answer/6247592

I'm just afraid I may not be able to attend and I'm very interested into
what you guys are going to discuss.

On Sep 7, 2017 1:07 AM, "Pritesh Maker" <[email protected]> wrote:

> Hi
>
> We don't plan on recording the event (it's a day long event!) but are
> looking at options to have a WebEx or Hangout link if folks want to join
> remotely.
>
> Pritesh
> _____________________________
> From: Muhammad Gelbana <[email protected]<mailto:[email protected]>>
> Sent: Wednesday, September 6, 2017 1:08 AM
> Subject: Re: Drill 2.0 (design) hackathon
> To: <[email protected]<mailto:[email protected]>>
>
>
> Would anyone kindly own the recording of the event ?
>
> On Sep 6, 2017 7:47 AM, "Aman Sinha" <[email protected]<mailto:a
> [email protected]>> wrote:
>
> > Here is the Eventbrite event for registration:
> >
> > https://www.eventbrite.com/e/drill-developer-day-sept-2017-
> > registration-7478463285
> >
> > Please register so we can plan for food and drinks appropriately.
> >
> > The link also contains a google doc link for the preliminary agenda and a
> > 'Topics' tab with volunteer sign-up column. Please add your name to the
> > area(s) of interest.
> >
> > Thanks and look forward to seeing you all !
> >
> > -Aman
> >
> > On Wed, Aug 30, 2017 at 9:44 AM, Paul Rogers <[email protected]<mailto:
> [email protected]>> wrote:
> >
> > > A partial list of Drill’s public APIs:
> > >
> > > IMHO, highest priority for Drill 2.0.
> > >
> > >
> > > * JDBC/ODBC drivers
> > > * Client (for JDBC/ODBC) + ODBC & JDBC
> > > * Client (for full Drill async, columnar)
> > > * Storage plugin
> > > * Format plugin
> > > * System/session options
> > > * Queueing (e.g. ZK-based queues)
> > > * Rest API
> > > * Resource Planning (e.g. max query memory per node)
> > > * Metadata access, storage (e.g. file system locations vs. a
> > metastore)
> > > * Metadata files formats (Parquet, views, etc.)
> > >
> > > Lower priority for future releases:
> > >
> > >
> > > * Query Planning (e.g. Calcite rules)
> > > * Config options
> > > * SQL syntax, especially Drill extensions
> > > * UDF
> > > * Management (e.g. JMX, Rest API calls, etc.)
> > > * Drill File System (HDFS)
> > > * Web UI
> > > * Shell scripts
> > >
> > > There are certainly more. Please suggest those that are missing. I’ve
> > > taken a rough cut at which APIs need forward/backward compatibility
> > first,
> > > in part based on those that are the “most public” and most likely to
> > > change. Others are important, but we can’t do them all at once.
> > >
> > > Thanks,
> > >
> > > - Paul
> > >
> > > On Aug 29, 2017, at 6:00 PM, Aman Sinha <[email protected]<mailto:a
> [email protected]><mailto:a
> > > [email protected]<mailto:[email protected]>>> wrote:
> > >
> > > Hi Paul,
> > > certainly makes sense to have the API compatibility discussions during
> > this
> > > hackathon. The 2.0 release may be a good checkpoint to introduce
> > breaking
> > > changes necessitating changes to the ODBC/JDBC drivers and other
> external
> > > applications. As part of this exercise (not during the hackathon but
> as a
> > > follow-up action), we also should clearly identify the "public"
> > interfaces.
> > >
> > >
> > > I will add this to the agenda.
> > >
> > > thanks,
> > > -Aman
> > >
> > > On Tue, Aug 29, 2017 at 2:08 PM, Paul Rogers <[email protected]<mailto:
> [email protected]><mailto:
> > > [email protected]<mailto:[email protected]>>> wrote:
> > >
> > > Thanks Aman for organizing the Hackathon!
> > >
> > > The list included many good ideas for Drill 2.0. Some of those require
> > > changes to Drill’s “public” interfaces (file format, client protocol,
> SQL
> > > behavior, etc.)
> > >
> > > At present, Drill has no good mechanism to handle backward/forward
> > > compatibility at the API level. Protobuf versioning certainly helps,
> but
> > > can’t completely solve semantic changes (where a field changes meaning,
> > or
> > > a non-Protobuf data chunk changes format.) As just one concrete
> example,
> > > changing to Arrow will break pre-Arrow ODBC/JDBC drivers because class
> > > names and data formats will change.
> > >
> > > Perhaps we can prioritize, for the proposed 2.0 release, a one-time set
> > of
> > > breaking changes that introduce a versioning mechanism into our public
> > > APIs. Once these are in place, we can evolve the APIs in the future by
> > > following the newly-created versioning protocol.
> > >
> > > Without such a mechanism, we cannot support old & new clients in the
> same
> > > cluster. Nor can we support rolling upgrades. Of course, another
> solution
> > > is to get it right the second time, then freeze all APIs and agree to
> > never
> > > again change them. Not sure we have sufficient access to a crystal ball
> > to
> > > predict everything we’d ever need in our APIs, however...
> > >
> > > Thanks,
> > >
> > > - Paul
> > >
> > > On Aug 24, 2017, at 8:39 AM, Aman Sinha <[email protected]<mailto:a
> [email protected]><mailto:a
> > > [email protected]<mailto:[email protected]>>> wrote:
> > >
> > > Drill Developers,
> > >
> > > In order to kick-start the Drill 2.0 release discussions, I would like
> > > to
> > > propose a Drill 2.0 (design) hackathon (a.k.a Drill Developer Day ™ J
> ).
> > >
> > > As I mentioned in the hangout on Tuesday, MapR has offered to host it
> on
> > > Sept 18th at their offices at 350 Holger Way, San Jose. Hope that works
> > > for most of you!
> > >
> > > The goal is to get the community together for a day-long technical
> > > discussion on key topics in preparation for a Drill 2.0 release as well
> > > as
> > > potential improvements in upcoming 1.xx releases. Depending on the
> > > interest areas, we could form groups and have a volunteer lead each
> > > group.
> > >
> > > Based on prior discussions on the dev list, hangouts and existing
> JIRAs,
> > > there is already a substantial set of topics and I have summarized a
> few
> > > of
> > > them below. What other topics do folks want to talk about? Feel free
> > > to
> > > respond to this thread and I will create a google doc to consolidate.
> > > Understandably, the list would be long but we will use the hackathon to
> > > get
> > > a sense of a reasonable feature set for 1.xx and 2.0 releases.
> > >
> > >
> > > 1. Metadata management.
> > >
> > > 1a: Defining an abstraction layer for various types of metadata: views,
> > > schema, statistics, security
> > >
> > > 1b: Underlying storage for metadata: what are the options and their
> > > trade-offs?
> > >
> > > - Hive metastore
> > >
> > > - Parquet metadata cache (parquet specific)
> > >
> > > - An embedded DBMS
> > >
> > > - A distributed key-value store
> > >
> > > - Others..
> > >
> > >
> > >
> > > 2. Drill integration with Apache Arrow
> > >
> > > 2a: Evaluate the choices and tradeoffs
> > >
> > >
> > >
> > > 3. Resource management
> > >
> > > 3a: Memory limits per query
> > >
> > > 3b: Spilling
> > >
> > > 3c: Resource management with Drill on Yarn/Mesos/Kubernetes
> > >
> > > 3d: Local vs. global resource management
> > >
> > > 3e: Aligning with admission control/queueing
> > >
> > >
> > >
> > > 4. TPC-DS coverage and related planner/operator enhancements
> > >
> > > 4a: Additional set operations: INTERSECT, EXCEPT
> > >
> > > 4b: GROUPING SETS, ROLLUP, CUBE support
> > >
> > > 4c: Handling inequality joins and cartesian joins of non-scalar inputs
> > > (via Nested Loop Join)
> > >
> > > 4d: Remaining gaps in correlated subquery
> > >
> > > 4e: Statistics: Number of Distinct Values, Histograms
> > >
> > >
> > >
> > > 5. Schema handling
> > >
> > > 5a: Creation, management of schema
> > >
> > > 5b: Handling schema changes in certain common cases
> > >
> > > 5c: Schema-awareness
> > >
> > > 5d: Others TBD
> > >
> > >
> > >
> > > 6. Concurrency
> > >
> > > 6a: What are the bottlenecks to achieving higher concurrency
> > >
> > > 6b: Ideas to address these..e.g async execution ?
> > >
> > >
> > >
> > > 7. Storage plugins, REST APIs related enhancements
> > >
> > > <Topics TBD>
> > >
> > >
> > >
> > > 8. Performance improvements
> > >
> > > 8a: Filter pushdown
> > >
> > > 8b: Vectorized Parquet reader
> > >
> > > 8c: Code-gen improvements
> > >
> > > 8d: Others TBD
> > >
> > >
> > >
> > >
> >
>
>
>

Re: Drill 2.0 (design) hackathon

Reply via email to