Hello Drillers,
Here is the webex link for remote attendees. Remote attendees can join at https://mapr.webex.com/mapr/j.phpMTID=ma05d8b5406acdb6292d5b81c79240a38 Thanks > On Nov 2, 2018, at 11:25 AM, Abhishek Girish <[email protected]> wrote: > > Charles, I'm sure we'll have a link for remote folks to join - will share > it closer to the day. > >> On Thu, Nov 1, 2018 at 1:58 PM hanu mapr <[email protected]> wrote: >> >> Hello All, >> >> There was typo for the year in the mail. It should be 2018 instead of 2019. >> Thanks Aman for correcting it. >> >> Regards, >> -Hanu >> >>> On Thu, Nov 1, 2018 at 6:30 AM Charles Givre <[email protected]> wrote: >>> >>> Hi Hanumath, >>> This looks great!! Will you be streaming the event for those of us not >> in >>> the Bay Area? >>> Thx, >>> — C >>> >>>> On Nov 1, 2018, at 00:10, Hanumath Rao Maduri <[email protected]> >>> wrote: >>>> >>>> Drill Developers, >>>> >>>> >>>> I am quite excited to announce the details of the Drill developers day >>>> 2018. I have consolidated the topics from our earlier discussions and >>>> prioritized them according to the votes. >>>> >>>> >>>> MapR has offered to host it on Nov 14th in Training room downstairs. >>>> >>>> >>>> Here is the exact location >>>> >>>> >>>> Training Room at >>>> >>>> 4555 Great America Pkwy, Suite 201, Santa Clara, CA, 95054. >>>> >>>> >>>> Please find the agenda for the meetup. >>>> >>>> >>>> >>>> *Lunch starts at 12:00PM.* >>>> >>>> >>>> *[12:25 - 12:40] Welcome * >>>> >>>> - Recap on last year's activities >>>> - Preview of this year's focus >>>> >>>> *[12:40 - 1:00] Storage plugins* >>>> >>>> >>>> >>>> - Adding new storage plugins for the following: >>>> - Netflix Iceberg, Kudu(some code already exists), Cassandra, >>>> Elasticsearch, Carbondata, ORC/XML file formats, Spark >>>> RDD/DataFrames/Datasets, Graph databases & more >>>> - Improving documentation related to Storage plugins >>>> >>>> >>>> *[1:00 - 1:45] Schema discovery & Evolution* >>>> >>>> >>>> >>>> - Creation, management of schema >>>> - Handling schema changes in certain common cases >>>> - Handling NULL values elegantly >>>> - Schema learning (similar to MSGpack plugin) >>>> - Query hints >>>> >>>> *[1:45 - 2:30] Metadata Management* >>>> >>>> >>>> >>>> - Defining an abstraction layer for various types of metadata: views, >>>> schema, statistics, security >>>> - Underlying storage for metadata: what are the options and their >>>> trade-offs? >>>> - Hive metastore >>>> - Parquet metadata cache (parquet specific for row group metadata) >>>> - Ease of using the parquet files generated by other engines (like >>> spark) >>>> >>>> >>>> *[2:30 - 2:45] Break* >>>> >>>> >>>> *[2:45 - 4:00] Resource management* >>>> >>>> >>>> >>>> - Resource limits per query >>>> - Optimal memory assignment for blocking operators based on stats >>>> - Enhancing the blocking and exchange operators to live within memory >>>> limits >>>> - Aligning with admission control/queueing (YARN concepts) >>>> - Query scheduling based on queues using tagging and costing >>>> - Drill on kubernetes >>>> >>>> >>>> *[4:00 - 4:20] Apache Arrow* >>>> >>>> - Benefits of integrating Apache Drill with Apache Arrow >>>> - Possible trade-offs & implementation hurdles >>>> >>>> *[4:20 - 4:40] **Performance Improvements* >>>> >>>> - Efficient handling of Broadcast/Semi/Anti Semi join >>>> - Drill Statistics handling >>>> - Optimizing complex Parquet reader >>>> >>>> Thanks, >>>> -Hanu >>> >>> >>
