Re: Drill 2.0 (design) hackathon

Pritesh Maker Fri, 15 Sep 2017 10:24:17 -0700

Hi All

We are looking forward to hosting the hackathon on Monday. Just a few updates 
on the logistics and agenda


• We are expecting over 25 people attending the event – you can see the 
attendee list at the Eventbrite site -  
https://www.eventbrite.com/e/drill-developer-day-sept-2017-registration-7478463285
 

• Breakfast will be served starting at 8:30AM – we would like to begin promptly 
at 9AM 

• The agenda has been updated to reflect the speakers (see the update in the 
sheet - 
https://docs.google.com/spreadsheets/d/1PEpgmBNAaPcu9UhWmZ8yPYtXbUGqOAYwH87alWkpCic/edit#gid=0
 )
o Key Note & Introduction – Ted Dunning, Parth Chandra and Aman Sinha 
o Community Contributions – Anil Kumar, John Omernik, Charles Givre and Ted 
Dunning 
o Two tracks for technical design discussions – some topics have initial 
thoughts for the topics and some will have open brainstorming discussions
o Once the discussions are concluded, we will have summaries presented and 
notes shared with the community

• We will have a WebEx for the first two sessions. For the two tracks, we will 
either continue the WebEx or have Hangout links (will publish them to the 
google sheet)
"JOIN WEBEX MEETING
https://mapr.webex.com/mapr/j.php?MTID=m9d39036e3953cce59ea81250c70c6c76
Meeting number (access code): 806 111 950
Meeting password: ApacheDrill"

• For the attendees in person, we have made bookings for a dinner in the 
evening - https://www.yelp.com/biz/chili-garden-restaurant-milpitas 

Looking forward to a fantastic day for the Apache Drill! community!

Thanks,
Pritesh



On 9/5/17, 10:47 PM, "Aman Sinha" <amansi...@apache.org> wrote:

    Here is the Eventbrite event for registration:
    
    
https://www.eventbrite.com/e/drill-developer-day-sept-2017-registration-7478463285
    
    Please register so we can plan for food and drinks appropriately.
    
    The link also contains a google doc link for the preliminary agenda and a
    'Topics' tab with volunteer sign-up column.  Please add your name to the
    area(s) of interest.
    
    Thanks and look forward to seeing you all !
    
    -Aman
    
    On Wed, Aug 30, 2017 at 9:44 AM, Paul Rogers <prog...@mapr.com> wrote:
    
    > A partial list of Drill’s public APIs:
    >
    > IMHO, highest priority for Drill 2.0.
    >
    >
    >   *   JDBC/ODBC drivers
    >   *   Client (for JDBC/ODBC) + ODBC & JDBC
    >   *   Client (for full Drill async, columnar)
    >   *   Storage plugin
    >   *   Format plugin
    >   *   System/session options
    >   *   Queueing (e.g. ZK-based queues)
    >   *   Rest API
    >   *   Resource Planning (e.g. max query memory per node)
    >   *   Metadata access, storage (e.g. file system locations vs. a 
metastore)
    >   *   Metadata files formats (Parquet, views, etc.)
    >
    > Lower priority for future releases:
    >
    >
    >   *   Query Planning (e.g. Calcite rules)
    >   *   Config options
    >   *   SQL syntax, especially Drill extensions
    >   *   UDF
    >   *   Management (e.g. JMX, Rest API calls, etc.)
    >   *   Drill File System (HDFS)
    >   *   Web UI
    >   *   Shell scripts
    >
    > There are certainly more. Please suggest those that are missing. I’ve
    > taken a rough cut at which APIs need forward/backward compatibility first,
    > in part based on those that are the “most public” and most likely to
    > change. Others are important, but we can’t do them all at once.
    >
    > Thanks,
    >
    > - Paul
    >
    > On Aug 29, 2017, at 6:00 PM, Aman Sinha <amansi...@apache.org<mailto:a
    > mansi...@apache.org>> wrote:
    >
    > Hi Paul,
    > certainly makes sense to have the API compatibility discussions during 
this
    > hackathon.  The 2.0 release may be a good checkpoint to introduce breaking
    > changes necessitating changes to the ODBC/JDBC drivers and other external
    > applications. As part of this exercise (not during the hackathon but as a
    > follow-up action), we also should clearly identify the "public" 
interfaces.
    >
    >
    > I will add this to the agenda.
    >
    > thanks,
    > -Aman
    >
    > On Tue, Aug 29, 2017 at 2:08 PM, Paul Rogers <prog...@mapr.com<mailto:
    > prog...@mapr.com>> wrote:
    >
    > Thanks Aman for organizing the Hackathon!
    >
    > The list included many good ideas for Drill 2.0. Some of those require
    > changes to Drill’s “public” interfaces (file format, client protocol, SQL
    > behavior, etc.)
    >
    > At present, Drill has no good mechanism to handle backward/forward
    > compatibility at the API level. Protobuf versioning certainly helps, but
    > can’t completely solve semantic changes (where a field changes meaning, or
    > a non-Protobuf data chunk changes format.) As just one concrete example,
    > changing to Arrow will break pre-Arrow ODBC/JDBC drivers because class
    > names and data formats will change.
    >
    > Perhaps we can prioritize, for the proposed 2.0 release, a one-time set of
    > breaking changes that introduce a versioning mechanism into our public
    > APIs. Once these are in place, we can evolve the APIs in the future by
    > following the newly-created versioning protocol.
    >
    > Without such a mechanism, we cannot support old & new clients in the same
    > cluster. Nor can we support rolling upgrades. Of course, another solution
    > is to get it right the second time, then freeze all APIs and agree to 
never
    > again change them. Not sure we have sufficient access to a crystal ball to
    > predict everything we’d ever need in our APIs, however...
    >
    > Thanks,
    >
    > - Paul
    >
    > On Aug 24, 2017, at 8:39 AM, Aman Sinha <amansi...@apache.org<mailto:a
    > mansi...@apache.org>> wrote:
    >
    > Drill Developers,
    >
    > In order to kick-start the Drill 2.0  release discussions, I would like
    > to
    > propose a Drill 2.0  (design) hackathon (a.k.a Drill Developer Day ™ J ).
    >
    > As I mentioned in the hangout on Tuesday,  MapR has offered to host it on
    > Sept 18th at their offices at 350 Holger Way, San Jose.   Hope that works
    > for most of you!
    >
    > The goal is to get the community together for a day-long technical
    > discussion on key topics in preparation for a Drill 2.0 release as well
    > as
    > potential improvements in upcoming 1.xx releases.  Depending on the
    > interest areas, we could form groups and have a volunteer lead each
    > group.
    >
    > Based on prior discussions on the dev list, hangouts and existing JIRAs,
    > there is already a substantial set of topics and I have summarized a few
    > of
    > them below.   What other topics do folks want to talk about?   Feel free
    > to
    > respond to this thread and I will create a google doc to consolidate.
    > Understandably, the list would be long but we will use the hackathon to
    > get
    > a sense of a reasonable feature set for 1.xx and 2.0 releases.
    >
    >
    > 1. Metadata management.
    >
    > 1a: Defining an abstraction layer for various types of metadata: views,
    > schema, statistics, security
    >
    > 1b: Underlying storage for metadata: what are the options and their
    > trade-offs?
    >
    >     - Hive metastore
    >
    >     - Parquet metadata cache (parquet specific)
    >
    >     - An embedded DBMS
    >
    >     - A distributed key-value store
    >
    >     - Others..
    >
    >
    >
    > 2. Drill integration with Apache Arrow
    >
    > 2a: Evaluate the choices and tradeoffs
    >
    >
    >
    > 3. Resource management
    >
    > 3a: Memory limits per query
    >
    > 3b: Spilling
    >
    > 3c: Resource management with Drill on Yarn/Mesos/Kubernetes
    >
    > 3d: Local vs. global resource management
    >
    > 3e: Aligning with admission control/queueing
    >
    >
    >
    > 4. TPC-DS coverage and related planner/operator enhancements
    >
    > 4a: Additional set operations: INTERSECT, EXCEPT
    >
    > 4b: GROUPING SETS, ROLLUP, CUBE support
    >
    > 4c: Handling inequality joins and cartesian joins of non-scalar inputs
    > (via Nested Loop Join)
    >
    > 4d: Remaining gaps in correlated subquery
    >
    > 4e: Statistics: Number of Distinct Values, Histograms
    >
    >
    >
    > 5. Schema handling
    >
    > 5a: Creation, management of schema
    >
    > 5b: Handling schema changes in certain common cases
    >
    > 5c: Schema-awareness
    >
    > 5d: Others TBD
    >
    >
    >
    > 6. Concurrency
    >
    > 6a: What are the bottlenecks to achieving higher concurrency
    >
    > 6b: Ideas to address these..e.g async execution ?
    >
    >
    >
    > 7. Storage plugins,  REST APIs related enhancements
    >
    >   <Topics TBD>
    >
    >
    >
    > 8. Performance improvements
    >
    > 8a: Filter pushdown
    >
    > 8b: Vectorized Parquet reader
    >
    > 8c: Code-gen improvements
    >
    > 8d: Others TBD
    >
    >
    >
    >

Re: Drill 2.0 (design) hackathon

Reply via email to