Hi Hanumath, This looks great!! Will you be streaming the event for those of us not in the Bay Area? Thx, — C
> On Nov 1, 2018, at 00:10, Hanumath Rao Maduri <hanu....@gmail.com> wrote: > > Drill Developers, > > > I am quite excited to announce the details of the Drill developers day > 2019. I have consolidated the topics from our earlier discussions and > prioritized them according to the votes. > > > MapR has offered to host it on Nov 14th in Training room downstairs. > > > Here is the exact location > > > Training Room at > > 4555 Great America Pkwy, Suite 201, Santa Clara, CA, 95054. > > > Please find the agenda for the meetup. > > > > *Lunch starts at 12:00PM.* > > > *[12:25 - 12:40] Welcome * > > - Recap on last year's activities > - Preview of this year's focus > > *[12:40 - 1:00] Storage plugins* > > > > - Adding new storage plugins for the following: > - Netflix Iceberg, Kudu(some code already exists), Cassandra, > Elasticsearch, Carbondata, ORC/XML file formats, Spark > RDD/DataFrames/Datasets, Graph databases & more > - Improving documentation related to Storage plugins > > > *[1:00 - 1:45] Schema discovery & Evolution* > > > > - Creation, management of schema > - Handling schema changes in certain common cases > - Handling NULL values elegantly > - Schema learning (similar to MSGpack plugin) > - Query hints > > *[1:45 - 2:30] Metadata Management* > > > > - Defining an abstraction layer for various types of metadata: views, > schema, statistics, security > - Underlying storage for metadata: what are the options and their > trade-offs? > - Hive metastore > - Parquet metadata cache (parquet specific for row group metadata) > - Ease of using the parquet files generated by other engines (like spark) > > > *[2:30 - 2:45] Break* > > > *[2:45 - 4:00] Resource management* > > > > - Resource limits per query > - Optimal memory assignment for blocking operators based on stats > - Enhancing the blocking and exchange operators to live within memory > limits > - Aligning with admission control/queueing (YARN concepts) > - Query scheduling based on queues using tagging and costing > - Drill on kubernetes > > > *[4:00 - 4:20] Apache Arrow* > > - Benefits of integrating Apache Drill with Apache Arrow > - Possible trade-offs & implementation hurdles > > *[4:20 - 4:40] **Performance Improvements* > > - Efficient handling of Broadcast/Semi/Anti Semi join > - Drill Statistics handling > - Optimizing complex Parquet reader > > Thanks, > -Hanu