Hi Paul, Thank you for your work on this. This is really excellent and I'm looking forward to porting my plugins over! -- C
> On Jun 14, 2019, at 12:03 AM, Paul Rogers <par0...@yahoo.com.INVALID> wrote: > > Hi All, > > A previous note explained how Drill has added the "Extended Vector Framework" > (also called the "Row Set Framework") to improve the user's experience with > Drill. On of Drill's key contributions is "schema-on-read": Drill can make > sense of many kinds of data files without the hassle of setting up the Hive > Meta Store (HMS). While Drill can use HMS, but it is often more convenient to > just query a table (directory of files) without first defining a schema in > HMS. > > The EVF helps to solve two problems that crop up with the schema-on-read > approach: > > * Drill does not know the size of the data to be read, yet each reader must > limit record batch sizes to a configured maximum. > > * File schemas can be ambiguous, resulting in two scan fragments picking > different column types, which can lead to query failures when Drill tries to > combine the results. > > For the user, EVF simply makes Drill work better, especially if they use > CREATE SCHEMA to tell Drill how to resolve schema ambiguities. > > To achieve our goals, storage and format plugins must change (or be created) > to use EVF. This is where you come in if you create or maintain plugins. > > We've prepared multiple ways for you to learn how to use the EVF: > > * The documentation of the CREATE SCHEMA statement. [1] > > * The text format plugin now uses EVF. This is, however, not the best example > because the plugin itself is rather complex. > > * Chapter 12 of the Learning Apache Drill book explains how to create a > format plugin. It uses the log format plugin as an example. We've converted > the log format plugin to use EVF (pull request pending at the moment.) > > * We've created an EVF tutorial that shows how to convert the log plugin to > use EVF. This connects up Chapter 12 of the Drill book with the recent EVF > work. [2] > > > Please use this mailing list to share questions, comments and suggestions as > you tackle your own plugins. Each plugin has its own unique quirks and issues > which we can discuss here. > > > Thanks, > - Paul > > [1] https://drill.apache.org/docs/create-or-replace-schema/ > > > [2] > https://github.com/paul-rogers/drill/wiki/Developer%27s-Guide-to-the-Enhanced-Vector-Framework > > >