Re: [Agenda] Drill developer meetup 2019

Charles Givre Thu, 01 Nov 2018 06:30:43 -0700

Hi Hanumath, 
This looks great!!  Will you be streaming the event for those of us not in the 
Bay Area?
Thx,
— C


> On Nov 1, 2018, at 00:10, Hanumath Rao Maduri <[email protected]> wrote:
> 
> Drill Developers,
> 
> 
> I am quite excited to announce the details of the Drill developers day
> 2019. I have consolidated the topics from our earlier discussions and
> prioritized them according to the votes.
> 
> 
> MapR has offered to host it on Nov 14th in Training room downstairs.
> 
> 
> Here is the exact location
> 
> 
> Training Room at
> 
> 4555 Great America Pkwy, Suite 201, Santa Clara, CA, 95054.
> 
> 
> Please find the agenda for the meetup.
> 
> 
> 
> *Lunch starts at 12:00PM.*
> 
> 
> *[12:25 - 12:40] Welcome *
> 
>   - Recap on last year's activities
>   - Preview of this year's focus
> 
> *[12:40 - 1:00] Storage plugins*
> 
> 
> 
>   - Adding new storage plugins for the following:
>      - Netflix Iceberg, Kudu(some code already exists), Cassandra,
>      Elasticsearch, Carbondata, ORC/XML file formats, Spark
>      RDD/DataFrames/Datasets, Graph databases & more
>   - Improving documentation related to Storage plugins
> 
> 
> *[1:00 - 1:45] Schema discovery & Evolution*
> 
> 
> 
>   - Creation, management of schema
>   - Handling schema changes in certain common cases
>   - Handling NULL values elegantly
>   - Schema learning (similar to MSGpack plugin)
>   - Query hints
> 
> *[1:45 - 2:30] Metadata Management*
> 
> 
> 
>   - Defining an abstraction layer for various types of metadata: views,
>   schema, statistics, security
>   - Underlying storage for metadata: what are the options and their
>   trade-offs?
>   - Hive metastore
>   - Parquet metadata cache (parquet specific for row group metadata)
>   - Ease of using the parquet files generated by other engines (like spark)
> 
> 
> *[2:30 - 2:45] Break*
> 
> 
> *[2:45 - 4:00] Resource management*
> 
> 
> 
>   - Resource limits per query
>   - Optimal memory assignment for blocking operators based on stats
>   - Enhancing the blocking and exchange operators to live within memory
>   limits
>   - Aligning with admission control/queueing (YARN concepts)
>   - Query scheduling based on queues using tagging and costing
>   - Drill on kubernetes
> 
> 
> *[4:00 - 4:20] Apache Arrow*
> 
>   - Benefits of integrating Apache Drill with Apache Arrow
>   - Possible trade-offs & implementation hurdles
> 
> *[4:20 - 4:40] **Performance Improvements*
> 
>   - Efficient handling of Broadcast/Semi/Anti Semi join
>   - Drill Statistics handling
>   - Optimizing complex Parquet reader
> 
> Thanks,
> -Hanu

Re: [Agenda] Drill developer meetup 2019

Reply via email to