I think that's a good idea. We could put this up in a list (in the google doc) of items to discuss on the hangout. That way, if we have no pressing topics to discuss, we can certainly pick something from the list .
-----Original Message----- From: Aman Sinha [mailto:amansi...@apache.org] Sent: Wednesday, September 20, 2017 8:13 AM To: dev@drill.apache.org Subject: Re: Drill 2.0 (design) hackathon Thanks to all the folks who attended the hackathon - both local and remote. For the remote attendees, you missed out on a good dinner :) We had a day of excellent discussion on several topics: Resource management, operator level performance improvements, TPC-DS coverage, metadata management, concurrency, usability and error handling, storage plugins + rest APIs. It will take a couple of days to compile all the notes and we will post them. Since the focus was more in-depth discussion rather than breadth, and 1 day is clearly not adequate, some topics were left out. We can continue those discussions on the dev list / hangout or if it can wait, possibly do it in a future hackathon. -Aman On Fri, Sep 15, 2017 at 2:54 PM, Charles Givre <cgi...@gmail.com> wrote: > Hi Pritesh, > What time do you think you’d want me to present? Also, should I make > some slides? > Best, > — C > > > On Sep 15, 2017, at 13:23, Pritesh Maker <pma...@mapr.com> wrote: > > > > Hi All > > > > We are looking forward to hosting the hackathon on Monday. Just a > > few > updates on the logistics and agenda > > > > • We are expecting over 25 people attending the event – you can see > > the > attendee list at the Eventbrite site - https://www.eventbrite.com/e/ > drill-developer-day-sept-2017-registration-7478463285 > > > > • Breakfast will be served starting at 8:30AM – we would like to > > begin > promptly at 9AM > > > > • The agenda has been updated to reflect the speakers (see the > > update in > the sheet - https://docs.google.com/spreadsheets/d/ > 1PEpgmBNAaPcu9UhWmZ8yPYtXbUGqOAYwH87alWkpCic/edit#gid=0 ) > > o Key Note & Introduction – Ted Dunning, Parth Chandra and Aman > > Sinha o Community Contributions – Anil Kumar, John Omernik, Charles > > Givre and > Ted Dunning > > o Two tracks for technical design discussions – some topics have > > initial > thoughts for the topics and some will have open brainstorming > discussions > > o Once the discussions are concluded, we will have summaries > > presented > and notes shared with the community > > > > • We will have a WebEx for the first two sessions. For the two > > tracks, > we will either continue the WebEx or have Hangout links (will publish > them to the google sheet) > > "JOIN WEBEX MEETING > > https://mapr.webex.com/mapr/j.php?MTID=m9d39036e3953cce59ea81250c70c > > 6c76 Meeting number (access code): 806 111 950 Meeting password: > > ApacheDrill" > > > > • For the attendees in person, we have made bookings for a dinner in > > the > evening - https://www.yelp.com/biz/chili-garden-restaurant-milpitas > > > > Looking forward to a fantastic day for the Apache Drill! community! > > > > Thanks, > > Pritesh > > > > > > > > On 9/5/17, 10:47 PM, "Aman Sinha" <amansi...@apache.org> wrote: > > > > Here is the Eventbrite event for registration: > > > > https://www.eventbrite.com/e/drill-developer-day-sept-2017- > registration-7478463285 > > > > Please register so we can plan for food and drinks appropriately. > > > > The link also contains a google doc link for the preliminary > > agenda > and a > > 'Topics' tab with volunteer sign-up column. Please add your name > > to > the > > area(s) of interest. > > > > Thanks and look forward to seeing you all ! > > > > -Aman > > > > On Wed, Aug 30, 2017 at 9:44 AM, Paul Rogers <prog...@mapr.com> > wrote: > > > >> A partial list of Drill’s public APIs: > >> > >> IMHO, highest priority for Drill 2.0. > >> > >> > >> * JDBC/ODBC drivers > >> * Client (for JDBC/ODBC) + ODBC & JDBC > >> * Client (for full Drill async, columnar) > >> * Storage plugin > >> * Format plugin > >> * System/session options > >> * Queueing (e.g. ZK-based queues) > >> * Rest API > >> * Resource Planning (e.g. max query memory per node) > >> * Metadata access, storage (e.g. file system locations vs. a > metastore) > >> * Metadata files formats (Parquet, views, etc.) > >> > >> Lower priority for future releases: > >> > >> > >> * Query Planning (e.g. Calcite rules) > >> * Config options > >> * SQL syntax, especially Drill extensions > >> * UDF > >> * Management (e.g. JMX, Rest API calls, etc.) > >> * Drill File System (HDFS) > >> * Web UI > >> * Shell scripts > >> > >> There are certainly more. Please suggest those that are missing. > >> I’ve taken a rough cut at which APIs need forward/backward > >> compatibility > first, > >> in part based on those that are the “most public” and most likely > >> to change. Others are important, but we can’t do them all at once. > >> > >> Thanks, > >> > >> - Paul > >> > >> On Aug 29, 2017, at 6:00 PM, Aman Sinha > >> <amansi...@apache.org<mailto:a mansi...@apache.org>> wrote: > >> > >> Hi Paul, > >> certainly makes sense to have the API compatibility discussions > >> during > this > >> hackathon. The 2.0 release may be a good checkpoint to introduce > breaking > >> changes necessitating changes to the ODBC/JDBC drivers and other > external > >> applications. As part of this exercise (not during the hackathon > >> but as > a > >> follow-up action), we also should clearly identify the "public" > interfaces. > >> > >> > >> I will add this to the agenda. > >> > >> thanks, > >> -Aman > >> > >> On Tue, Aug 29, 2017 at 2:08 PM, Paul Rogers <prog...@mapr.com<mailto: > >> prog...@mapr.com>> wrote: > >> > >> Thanks Aman for organizing the Hackathon! > >> > >> The list included many good ideas for Drill 2.0. Some of those > >> require changes to Drill’s “public” interfaces (file format, client > >> protocol, > SQL > >> behavior, etc.) > >> > >> At present, Drill has no good mechanism to handle backward/forward > >> compatibility at the API level. Protobuf versioning certainly > >> helps, but can’t completely solve semantic changes (where a field > >> changes meaning, > or > >> a non-Protobuf data chunk changes format.) As just one concrete > >> example, changing to Arrow will break pre-Arrow ODBC/JDBC drivers > >> because class names and data formats will change. > >> > >> Perhaps we can prioritize, for the proposed 2.0 release, a one-time > >> set > of > >> breaking changes that introduce a versioning mechanism into our > >> public APIs. Once these are in place, we can evolve the APIs in the > >> future by following the newly-created versioning protocol. > >> > >> Without such a mechanism, we cannot support old & new clients in > >> the > same > >> cluster. Nor can we support rolling upgrades. Of course, another > solution > >> is to get it right the second time, then freeze all APIs and agree > >> to > never > >> again change them. Not sure we have sufficient access to a crystal > >> ball > to > >> predict everything we’d ever need in our APIs, however... > >> > >> Thanks, > >> > >> - Paul > >> > >> On Aug 24, 2017, at 8:39 AM, Aman Sinha > >> <amansi...@apache.org<mailto:a mansi...@apache.org>> wrote: > >> > >> Drill Developers, > >> > >> In order to kick-start the Drill 2.0 release discussions, I would > >> like to propose a Drill 2.0 (design) hackathon (a.k.a Drill > >> Developer Day ™ J > ). > >> > >> As I mentioned in the hangout on Tuesday, MapR has offered to host > >> it > on > >> Sept 18th at their offices at 350 Holger Way, San Jose. Hope that > works > >> for most of you! > >> > >> The goal is to get the community together for a day-long technical > >> discussion on key topics in preparation for a Drill 2.0 release as > >> well as potential improvements in upcoming 1.xx releases. > >> Depending on the interest areas, we could form groups and have a > >> volunteer lead each group. > >> > >> Based on prior discussions on the dev list, hangouts and existing > >> JIRAs, there is already a substantial set of topics and I have > >> summarized a few of > >> them below. What other topics do folks want to talk about? Feel free > >> to > >> respond to this thread and I will create a google doc to consolidate. > >> Understandably, the list would be long but we will use the > >> hackathon to get a sense of a reasonable feature set for 1.xx and > >> 2.0 releases. > >> > >> > >> 1. Metadata management. > >> > >> 1a: Defining an abstraction layer for various types of metadata: > >> views, schema, statistics, security > >> > >> 1b: Underlying storage for metadata: what are the options and their > >> trade-offs? > >> > >> - Hive metastore > >> > >> - Parquet metadata cache (parquet specific) > >> > >> - An embedded DBMS > >> > >> - A distributed key-value store > >> > >> - Others.. > >> > >> > >> > >> 2. Drill integration with Apache Arrow > >> > >> 2a: Evaluate the choices and tradeoffs > >> > >> > >> > >> 3. Resource management > >> > >> 3a: Memory limits per query > >> > >> 3b: Spilling > >> > >> 3c: Resource management with Drill on Yarn/Mesos/Kubernetes > >> > >> 3d: Local vs. global resource management > >> > >> 3e: Aligning with admission control/queueing > >> > >> > >> > >> 4. TPC-DS coverage and related planner/operator enhancements > >> > >> 4a: Additional set operations: INTERSECT, EXCEPT > >> > >> 4b: GROUPING SETS, ROLLUP, CUBE support > >> > >> 4c: Handling inequality joins and cartesian joins of non-scalar > >> inputs (via Nested Loop Join) > >> > >> 4d: Remaining gaps in correlated subquery > >> > >> 4e: Statistics: Number of Distinct Values, Histograms > >> > >> > >> > >> 5. Schema handling > >> > >> 5a: Creation, management of schema > >> > >> 5b: Handling schema changes in certain common cases > >> > >> 5c: Schema-awareness > >> > >> 5d: Others TBD > >> > >> > >> > >> 6. Concurrency > >> > >> 6a: What are the bottlenecks to achieving higher concurrency > >> > >> 6b: Ideas to address these..e.g async execution ? > >> > >> > >> > >> 7. Storage plugins, REST APIs related enhancements > >> > >> <Topics TBD> > >> > >> > >> > >> 8. Performance improvements > >> > >> 8a: Filter pushdown > >> > >> 8b: Vectorized Parquet reader > >> > >> 8c: Code-gen improvements > >> > >> 8d: Others TBD > >> > >> > >> > >> > > > > > >