Drill Developers,

I am quite excited to announce the details of the Drill developers day
2019. I have consolidated the topics from our earlier discussions and
prioritized them according to the votes.


MapR has offered to host it on Nov 14th in Training room downstairs.


Here is the exact location


Training Room at

4555 Great America Pkwy, Suite 201, Santa Clara, CA, 95054.


Please find the agenda for the meetup.



*Lunch starts at 12:00PM.*


*[12:25 - 12:40] Welcome *

   - Recap on last year's activities
   - Preview of this year's focus

*[12:40 - 1:00] Storage plugins*



   - Adding new storage plugins for the following:
      - Netflix Iceberg, Kudu(some code already exists), Cassandra,
      Elasticsearch, Carbondata, ORC/XML file formats, Spark
      RDD/DataFrames/Datasets, Graph databases & more
   - Improving documentation related to Storage plugins


*[1:00 - 1:45] Schema discovery & Evolution*



   - Creation, management of schema
   - Handling schema changes in certain common cases
   - Handling NULL values elegantly
   - Schema learning (similar to MSGpack plugin)
   - Query hints

*[1:45 - 2:30] Metadata Management*



   - Defining an abstraction layer for various types of metadata: views,
   schema, statistics, security
   - Underlying storage for metadata: what are the options and their
   trade-offs?
   - Hive metastore
   - Parquet metadata cache (parquet specific for row group metadata)
   - Ease of using the parquet files generated by other engines (like spark)


*[2:30 - 2:45] Break*


*[2:45 - 4:00] Resource management*



   - Resource limits per query
   - Optimal memory assignment for blocking operators based on stats
   - Enhancing the blocking and exchange operators to live within memory
   limits
   - Aligning with admission control/queueing (YARN concepts)
   - Query scheduling based on queues using tagging and costing
   - Drill on kubernetes


*[4:00 - 4:20] Apache Arrow*

   - Benefits of integrating Apache Drill with Apache Arrow
   - Possible trade-offs & implementation hurdles

*[4:20 - 4:40] **Performance Improvements*

   - Efficient handling of Broadcast/Semi/Anti Semi join
   - Drill Statistics handling
   - Optimizing complex Parquet reader

Thanks,
-Hanu

Reply via email to