Re: Drill 2.0 (design) hackathon

Charles Givre Thu, 24 Aug 2017 08:42:39 -0700

Hi Aman, 
Would you consider doing some sort of livestream so that those of us who 
couldn’t be there in person can participate?
Thanks,
— C


> On Aug 24, 2017, at 11:39, Aman Sinha <[email protected]> wrote:
> 
> Drill Developers,
> 
> In order to kick-start the Drill 2.0  release discussions, I would like to
> propose a Drill 2.0  (design) hackathon (a.k.a Drill Developer Day ™ J ).
> 
> As I mentioned in the hangout on Tuesday,  MapR has offered to host it on
> Sept 18th at their offices at 350 Holger Way, San Jose.   Hope that works
> for most of you!
> 
> The goal is to get the community together for a day-long technical
> discussion on key topics in preparation for a Drill 2.0 release as well as
> potential improvements in upcoming 1.xx releases.  Depending on the
> interest areas, we could form groups and have a volunteer lead each group.
> 
> Based on prior discussions on the dev list, hangouts and existing JIRAs,
> there is already a substantial set of topics and I have summarized a few of
> them below.   What other topics do folks want to talk about?   Feel free to
> respond to this thread and I will create a google doc to consolidate.
> Understandably, the list would be long but we will use the hackathon to get
> a sense of a reasonable feature set for 1.xx and 2.0 releases.
> 
> 
> 1. Metadata management.
> 
>  1a: Defining an abstraction layer for various types of metadata: views,
> schema, statistics, security
> 
>  1b: Underlying storage for metadata: what are the options and their
> trade-offs?
> 
>      - Hive metastore
> 
>      - Parquet metadata cache (parquet specific)
> 
>      - An embedded DBMS
> 
>      - A distributed key-value store
> 
>      - Others..
> 
> 
> 
> 2. Drill integration with Apache Arrow
> 
>  2a: Evaluate the choices and tradeoffs
> 
> 
> 
> 3. Resource management
> 
>  3a: Memory limits per query
> 
>  3b: Spilling
> 
>  3c: Resource management with Drill on Yarn/Mesos/Kubernetes
> 
>  3d: Local vs. global resource management
> 
>  3e: Aligning with admission control/queueing
> 
> 
> 
> 4. TPC-DS coverage and related planner/operator enhancements
> 
>  4a: Additional set operations: INTERSECT, EXCEPT
> 
>  4b: GROUPING SETS, ROLLUP, CUBE support
> 
>  4c: Handling inequality joins and cartesian joins of non-scalar inputs
> (via Nested Loop Join)
> 
>  4d: Remaining gaps in correlated subquery
> 
>  4e: Statistics: Number of Distinct Values, Histograms
> 
> 
> 
> 5. Schema handling
> 
>  5a: Creation, management of schema
> 
>  5b: Handling schema changes in certain common cases
> 
>  5c: Schema-awareness
> 
>  5d: Others TBD
> 
> 
> 
> 6. Concurrency
> 
>  6a: What are the bottlenecks to achieving higher concurrency
> 
>  6b: Ideas to address these..e.g async execution ?
> 
> 
> 
> 7. Storage plugins,  REST APIs related enhancements
> 
>    <Topics TBD>
> 
> 
> 
> 8. Performance improvements
> 
>  8a: Filter pushdown
> 
>  8b: Vectorized Parquet reader
> 
>  8c: Code-gen improvements
> 
>  8d: Others TBD

Re: Drill 2.0 (design) hackathon

Reply via email to