Hi Paul, 
Thank you for your work on this.  This is really excellent and I'm looking 
forward to porting my plugins over!
-- C

> On Jun 14, 2019, at 12:03 AM, Paul Rogers <par0...@yahoo.com.INVALID> wrote:
> 
> Hi All,
> 
> A previous note explained how Drill has added the "Extended Vector Framework" 
> (also called the "Row Set Framework") to improve the user's experience with 
> Drill. On of Drill's key contributions is "schema-on-read": Drill can make 
> sense of many kinds of data files without the hassle of setting up the Hive 
> Meta Store (HMS). While Drill can use HMS, but it is often more convenient to 
> just query a table (directory of files) without first defining a schema in 
> HMS.
> 
> The EVF helps to solve two problems that crop up with the schema-on-read 
> approach:
> 
> * Drill does not know the size of the data to be read, yet each reader must 
> limit record batch sizes to a configured maximum.
> 
> * File schemas can be ambiguous, resulting in two scan fragments picking 
> different column types, which can lead to query failures when Drill tries to 
> combine the results.
> 
> For the user, EVF simply makes Drill work better, especially if they use 
> CREATE SCHEMA to tell Drill how to resolve schema ambiguities.
> 
> To achieve our goals, storage and format plugins must change (or be created) 
> to use EVF. This is where you come in if you create or maintain plugins.
> 
> We've prepared multiple ways for you to learn how to use the EVF:
> 
> * The documentation of the CREATE SCHEMA statement. [1]
> 
> * The text format plugin now uses EVF. This is, however, not the best example 
> because the plugin itself is rather complex.
> 
> *  Chapter 12 of the Learning Apache Drill book explains how to create a 
> format plugin. It uses the log format plugin as an example. We've converted 
> the log format plugin to use EVF (pull request pending at the moment.)
> 
> * We've created an EVF tutorial that shows how to convert the log plugin to 
> use EVF. This connects up Chapter 12 of the Drill book with the recent EVF 
> work. [2]
> 
> 
> Please use this mailing list to share questions, comments and suggestions as 
> you tackle your own plugins. Each plugin has its own unique quirks and issues 
> which we can discuss here.
> 
> 
> Thanks,
> - Paul
> 
> [1] https://drill.apache.org/docs/create-or-replace-schema/
> 
> 
> [2] 
> https://github.com/paul-rogers/drill/wiki/Developer%27s-Guide-to-the-Enhanced-Vector-Framework
> 
> 
> 

Reply via email to