Re: Drill 2.0 (design) hackathon

Aman Sinha Tue, 29 Aug 2017 13:17:19 -0700

Hi Anil,
yes, certainly talking about your work with the Kafka+Drill integration
would be very welcome.
Regarding the registration, there isn't anything formal yet.  We would like
to keep it lightweight but
to keep track of the topics and attendees I will send out either a Google
doc or some such way to sign up.
This should happen within the next week or so.


Thanks,
-Aman

On Tue, Aug 29, 2017 at 11:56 AM, AnilKumar B <[email protected]> wrote:

> Hi Aman,
>
> To attend Drill Developer's Day event, is there any registration process?
>
> Me and Kamesh wanted to present Kafka integration with Drill (
> https://github.com/akumarb2010/incubator-drill/
> tree/master/contrib/storage-kafka
> & https://issues.apache.org/jira/browse/DRILL-4779 )
>
> Is it possible to provide 15-20 minutes time for this?
>
>
>
> Thanks & Regards,
> B Anil Kumar.
>
> On Thu, Aug 24, 2017 at 8:59 AM, Aman Sinha <[email protected]> wrote:
>
> > Hi Charles,
> > yes, it would be great if remote folks could participate..I will look
> into
> > the options for livestreaming.
> >
> >
> > On Thu, Aug 24, 2017 at 8:42 AM, Charles Givre <[email protected]> wrote:
> >
> > > Hi Aman,
> > > Would you consider doing some sort of livestream so that those of us
> who
> > > couldn’t be there in person can participate?
> > > Thanks,
> > > — C
> > >
> > > > On Aug 24, 2017, at 11:39, Aman Sinha <[email protected]> wrote:
> > > >
> > > > Drill Developers,
> > > >
> > > > In order to kick-start the Drill 2.0  release discussions, I would
> like
> > > to
> > > > propose a Drill 2.0  (design) hackathon (a.k.a Drill Developer Day ™
> J
> > ).
> > > >
> > > > As I mentioned in the hangout on Tuesday,  MapR has offered to host
> it
> > on
> > > > Sept 18th at their offices at 350 Holger Way, San Jose.   Hope that
> > works
> > > > for most of you!
> > > >
> > > > The goal is to get the community together for a day-long technical
> > > > discussion on key topics in preparation for a Drill 2.0 release as
> well
> > > as
> > > > potential improvements in upcoming 1.xx releases.  Depending on the
> > > > interest areas, we could form groups and have a volunteer lead each
> > > group.
> > > >
> > > > Based on prior discussions on the dev list, hangouts and existing
> > JIRAs,
> > > > there is already a substantial set of topics and I have summarized a
> > few
> > > of
> > > > them below.   What other topics do folks want to talk about?   Feel
> > free
> > > to
> > > > respond to this thread and I will create a google doc to consolidate.
> > > > Understandably, the list would be long but we will use the hackathon
> to
> > > get
> > > > a sense of a reasonable feature set for 1.xx and 2.0 releases.
> > > >
> > > >
> > > > 1. Metadata management.
> > > >
> > > >  1a: Defining an abstraction layer for various types of metadata:
> > views,
> > > > schema, statistics, security
> > > >
> > > >  1b: Underlying storage for metadata: what are the options and their
> > > > trade-offs?
> > > >
> > > >      - Hive metastore
> > > >
> > > >      - Parquet metadata cache (parquet specific)
> > > >
> > > >      - An embedded DBMS
> > > >
> > > >      - A distributed key-value store
> > > >
> > > >      - Others..
> > > >
> > > >
> > > >
> > > > 2. Drill integration with Apache Arrow
> > > >
> > > >  2a: Evaluate the choices and tradeoffs
> > > >
> > > >
> > > >
> > > > 3. Resource management
> > > >
> > > >  3a: Memory limits per query
> > > >
> > > >  3b: Spilling
> > > >
> > > >  3c: Resource management with Drill on Yarn/Mesos/Kubernetes
> > > >
> > > >  3d: Local vs. global resource management
> > > >
> > > >  3e: Aligning with admission control/queueing
> > > >
> > > >
> > > >
> > > > 4. TPC-DS coverage and related planner/operator enhancements
> > > >
> > > >  4a: Additional set operations: INTERSECT, EXCEPT
> > > >
> > > >  4b: GROUPING SETS, ROLLUP, CUBE support
> > > >
> > > >  4c: Handling inequality joins and cartesian joins of non-scalar
> inputs
> > > > (via Nested Loop Join)
> > > >
> > > >  4d: Remaining gaps in correlated subquery
> > > >
> > > >  4e: Statistics: Number of Distinct Values, Histograms
> > > >
> > > >
> > > >
> > > > 5. Schema handling
> > > >
> > > >  5a: Creation, management of schema
> > > >
> > > >  5b: Handling schema changes in certain common cases
> > > >
> > > >  5c: Schema-awareness
> > > >
> > > >  5d: Others TBD
> > > >
> > > >
> > > >
> > > > 6. Concurrency
> > > >
> > > >  6a: What are the bottlenecks to achieving higher concurrency
> > > >
> > > >  6b: Ideas to address these..e.g async execution ?
> > > >
> > > >
> > > >
> > > > 7. Storage plugins,  REST APIs related enhancements
> > > >
> > > >    <Topics TBD>
> > > >
> > > >
> > > >
> > > > 8. Performance improvements
> > > >
> > > >  8a: Filter pushdown
> > > >
> > > >  8b: Vectorized Parquet reader
> > > >
> > > >  8c: Code-gen improvements
> > > >
> > > >  8d: Others TBD
> > >
> > >
> >
>

Re: Drill 2.0 (design) hackathon

Reply via email to