Thanks All, it is really helpful. On Wed, Sep 20, 2017 at 8:13 AM Charles Givre <cgi...@gmail.com> wrote:
> Thank you Aman for organizing and to MapR for hosting! > > On Wed, Sep 20, 2017 at 11:12 AM, Aman Sinha <amansi...@apache.org> wrote: > > > Thanks to all the folks who attended the hackathon - both local and > remote. > > For the remote attendees, you missed out on a good dinner :) > > > > We had a day of excellent discussion on several topics: Resource > > management, operator level performance improvements, TPC-DS coverage, > > metadata management, concurrency, usability and error handling, storage > > plugins + rest APIs. It will take a couple of days to compile all the > > notes and we will post them. > > > > Since the focus was more in-depth discussion rather than breadth, and 1 > day > > is clearly not adequate, some topics were left out. We can continue > those > > discussions on the dev list / hangout or if it can wait, possibly do it > in > > a future hackathon. > > > > -Aman > > > > On Fri, Sep 15, 2017 at 2:54 PM, Charles Givre <cgi...@gmail.com> wrote: > > > > > Hi Pritesh, > > > What time do you think you’d want me to present? Also, should I make > > some > > > slides? > > > Best, > > > — C > > > > > > > On Sep 15, 2017, at 13:23, Pritesh Maker <pma...@mapr.com> wrote: > > > > > > > > Hi All > > > > > > > > We are looking forward to hosting the hackathon on Monday. Just a few > > > updates on the logistics and agenda > > > > > > > > • We are expecting over 25 people attending the event – you can see > the > > > attendee list at the Eventbrite site - https://www.eventbrite.com/e/ > > > drill-developer-day-sept-2017-registration-7478463285 > > > > > > > > • Breakfast will be served starting at 8:30AM – we would like to > begin > > > promptly at 9AM > > > > > > > > • The agenda has been updated to reflect the speakers (see the update > > in > > > the sheet - https://docs.google.com/spreadsheets/d/ > > > 1PEpgmBNAaPcu9UhWmZ8yPYtXbUGqOAYwH87alWkpCic/edit#gid=0 ) > > > > o Key Note & Introduction – Ted Dunning, Parth Chandra and Aman Sinha > > > > o Community Contributions – Anil Kumar, John Omernik, Charles Givre > and > > > Ted Dunning > > > > o Two tracks for technical design discussions – some topics have > > initial > > > thoughts for the topics and some will have open brainstorming > discussions > > > > o Once the discussions are concluded, we will have summaries > presented > > > and notes shared with the community > > > > > > > > • We will have a WebEx for the first two sessions. For the two > tracks, > > > we will either continue the WebEx or have Hangout links (will publish > > them > > > to the google sheet) > > > > "JOIN WEBEX MEETING > > > > > https://mapr.webex.com/mapr/j.php?MTID=m9d39036e3953cce59ea81250c70c6 > > c76 > > > > Meeting number (access code): 806 111 950 > > > > Meeting password: ApacheDrill" > > > > > > > > • For the attendees in person, we have made bookings for a dinner in > > the > > > evening - https://www.yelp.com/biz/chili-garden-restaurant-milpitas > > > > > > > > Looking forward to a fantastic day for the Apache Drill! community! > > > > > > > > Thanks, > > > > Pritesh > > > > > > > > > > > > > > > > On 9/5/17, 10:47 PM, "Aman Sinha" <amansi...@apache.org> wrote: > > > > > > > > Here is the Eventbrite event for registration: > > > > > > > > https://www.eventbrite.com/e/drill-developer-day-sept-2017- > > > registration-7478463285 > > > > > > > > Please register so we can plan for food and drinks appropriately. > > > > > > > > The link also contains a google doc link for the preliminary > agenda > > > and a > > > > 'Topics' tab with volunteer sign-up column. Please add your name > to > > > the > > > > area(s) of interest. > > > > > > > > Thanks and look forward to seeing you all ! > > > > > > > > -Aman > > > > > > > > On Wed, Aug 30, 2017 at 9:44 AM, Paul Rogers <prog...@mapr.com> > > > wrote: > > > > > > > >> A partial list of Drill’s public APIs: > > > >> > > > >> IMHO, highest priority for Drill 2.0. > > > >> > > > >> > > > >> * JDBC/ODBC drivers > > > >> * Client (for JDBC/ODBC) + ODBC & JDBC > > > >> * Client (for full Drill async, columnar) > > > >> * Storage plugin > > > >> * Format plugin > > > >> * System/session options > > > >> * Queueing (e.g. ZK-based queues) > > > >> * Rest API > > > >> * Resource Planning (e.g. max query memory per node) > > > >> * Metadata access, storage (e.g. file system locations vs. a > > > metastore) > > > >> * Metadata files formats (Parquet, views, etc.) > > > >> > > > >> Lower priority for future releases: > > > >> > > > >> > > > >> * Query Planning (e.g. Calcite rules) > > > >> * Config options > > > >> * SQL syntax, especially Drill extensions > > > >> * UDF > > > >> * Management (e.g. JMX, Rest API calls, etc.) > > > >> * Drill File System (HDFS) > > > >> * Web UI > > > >> * Shell scripts > > > >> > > > >> There are certainly more. Please suggest those that are missing. > I’ve > > > >> taken a rough cut at which APIs need forward/backward compatibility > > > first, > > > >> in part based on those that are the “most public” and most likely to > > > >> change. Others are important, but we can’t do them all at once. > > > >> > > > >> Thanks, > > > >> > > > >> - Paul > > > >> > > > >> On Aug 29, 2017, at 6:00 PM, Aman Sinha <amansi...@apache.org > <mailto: > > a > > > >> mansi...@apache.org>> wrote: > > > >> > > > >> Hi Paul, > > > >> certainly makes sense to have the API compatibility discussions > during > > > this > > > >> hackathon. The 2.0 release may be a good checkpoint to introduce > > > breaking > > > >> changes necessitating changes to the ODBC/JDBC drivers and other > > > external > > > >> applications. As part of this exercise (not during the hackathon but > > as > > > a > > > >> follow-up action), we also should clearly identify the "public" > > > interfaces. > > > >> > > > >> > > > >> I will add this to the agenda. > > > >> > > > >> thanks, > > > >> -Aman > > > >> > > > >> On Tue, Aug 29, 2017 at 2:08 PM, Paul Rogers <prog...@mapr.com > > <mailto: > > > >> prog...@mapr.com>> wrote: > > > >> > > > >> Thanks Aman for organizing the Hackathon! > > > >> > > > >> The list included many good ideas for Drill 2.0. Some of those > require > > > >> changes to Drill’s “public” interfaces (file format, client > protocol, > > > SQL > > > >> behavior, etc.) > > > >> > > > >> At present, Drill has no good mechanism to handle backward/forward > > > >> compatibility at the API level. Protobuf versioning certainly helps, > > but > > > >> can’t completely solve semantic changes (where a field changes > > meaning, > > > or > > > >> a non-Protobuf data chunk changes format.) As just one concrete > > example, > > > >> changing to Arrow will break pre-Arrow ODBC/JDBC drivers because > class > > > >> names and data formats will change. > > > >> > > > >> Perhaps we can prioritize, for the proposed 2.0 release, a one-time > > set > > > of > > > >> breaking changes that introduce a versioning mechanism into our > public > > > >> APIs. Once these are in place, we can evolve the APIs in the future > by > > > >> following the newly-created versioning protocol. > > > >> > > > >> Without such a mechanism, we cannot support old & new clients in the > > > same > > > >> cluster. Nor can we support rolling upgrades. Of course, another > > > solution > > > >> is to get it right the second time, then freeze all APIs and agree > to > > > never > > > >> again change them. Not sure we have sufficient access to a crystal > > ball > > > to > > > >> predict everything we’d ever need in our APIs, however... > > > >> > > > >> Thanks, > > > >> > > > >> - Paul > > > >> > > > >> On Aug 24, 2017, at 8:39 AM, Aman Sinha <amansi...@apache.org > <mailto: > > a > > > >> mansi...@apache.org>> wrote: > > > >> > > > >> Drill Developers, > > > >> > > > >> In order to kick-start the Drill 2.0 release discussions, I would > > like > > > >> to > > > >> propose a Drill 2.0 (design) hackathon (a.k.a Drill Developer Day > ™ J > > > ). > > > >> > > > >> As I mentioned in the hangout on Tuesday, MapR has offered to host > it > > > on > > > >> Sept 18th at their offices at 350 Holger Way, San Jose. Hope that > > > works > > > >> for most of you! > > > >> > > > >> The goal is to get the community together for a day-long technical > > > >> discussion on key topics in preparation for a Drill 2.0 release as > > well > > > >> as > > > >> potential improvements in upcoming 1.xx releases. Depending on the > > > >> interest areas, we could form groups and have a volunteer lead each > > > >> group. > > > >> > > > >> Based on prior discussions on the dev list, hangouts and existing > > JIRAs, > > > >> there is already a substantial set of topics and I have summarized a > > few > > > >> of > > > >> them below. What other topics do folks want to talk about? Feel > > free > > > >> to > > > >> respond to this thread and I will create a google doc to > consolidate. > > > >> Understandably, the list would be long but we will use the hackathon > > to > > > >> get > > > >> a sense of a reasonable feature set for 1.xx and 2.0 releases. > > > >> > > > >> > > > >> 1. Metadata management. > > > >> > > > >> 1a: Defining an abstraction layer for various types of metadata: > > views, > > > >> schema, statistics, security > > > >> > > > >> 1b: Underlying storage for metadata: what are the options and their > > > >> trade-offs? > > > >> > > > >> - Hive metastore > > > >> > > > >> - Parquet metadata cache (parquet specific) > > > >> > > > >> - An embedded DBMS > > > >> > > > >> - A distributed key-value store > > > >> > > > >> - Others.. > > > >> > > > >> > > > >> > > > >> 2. Drill integration with Apache Arrow > > > >> > > > >> 2a: Evaluate the choices and tradeoffs > > > >> > > > >> > > > >> > > > >> 3. Resource management > > > >> > > > >> 3a: Memory limits per query > > > >> > > > >> 3b: Spilling > > > >> > > > >> 3c: Resource management with Drill on Yarn/Mesos/Kubernetes > > > >> > > > >> 3d: Local vs. global resource management > > > >> > > > >> 3e: Aligning with admission control/queueing > > > >> > > > >> > > > >> > > > >> 4. TPC-DS coverage and related planner/operator enhancements > > > >> > > > >> 4a: Additional set operations: INTERSECT, EXCEPT > > > >> > > > >> 4b: GROUPING SETS, ROLLUP, CUBE support > > > >> > > > >> 4c: Handling inequality joins and cartesian joins of non-scalar > inputs > > > >> (via Nested Loop Join) > > > >> > > > >> 4d: Remaining gaps in correlated subquery > > > >> > > > >> 4e: Statistics: Number of Distinct Values, Histograms > > > >> > > > >> > > > >> > > > >> 5. Schema handling > > > >> > > > >> 5a: Creation, management of schema > > > >> > > > >> 5b: Handling schema changes in certain common cases > > > >> > > > >> 5c: Schema-awareness > > > >> > > > >> 5d: Others TBD > > > >> > > > >> > > > >> > > > >> 6. Concurrency > > > >> > > > >> 6a: What are the bottlenecks to achieving higher concurrency > > > >> > > > >> 6b: Ideas to address these..e.g async execution ? > > > >> > > > >> > > > >> > > > >> 7. Storage plugins, REST APIs related enhancements > > > >> > > > >> <Topics TBD> > > > >> > > > >> > > > >> > > > >> 8. Performance improvements > > > >> > > > >> 8a: Filter pushdown > > > >> > > > >> 8b: Vectorized Parquet reader > > > >> > > > >> 8c: Code-gen improvements > > > >> > > > >> 8d: Others TBD > > > >> > > > >> > > > >> > > > >> > > > > > > > > > > > > > > > > > -- Thanks & Regards, B Anil Kumar.