Hi All,

A previous note explained how Drill has added the "Extended Vector Framework" 
(also called the "Row Set Framework") to improve the user's experience with 
Drill. On of Drill's key contributions is "schema-on-read": Drill can make 
sense of many kinds of data files without the hassle of setting up the Hive 
Meta Store (HMS). While Drill can use HMS, but it is often more convenient to 
just query a table (directory of files) without first defining a schema in HMS.

The EVF helps to solve two problems that crop up with the schema-on-read 
approach:

* Drill does not know the size of the data to be read, yet each reader must 
limit record batch sizes to a configured maximum.

* File schemas can be ambiguous, resulting in two scan fragments picking 
different column types, which can lead to query failures when Drill tries to 
combine the results.

For the user, EVF simply makes Drill work better, especially if they use CREATE 
SCHEMA to tell Drill how to resolve schema ambiguities.

To achieve our goals, storage and format plugins must change (or be created) to 
use EVF. This is where you come in if you create or maintain plugins.

We've prepared multiple ways for you to learn how to use the EVF:

* The documentation of the CREATE SCHEMA statement. [1]

* The text format plugin now uses EVF. This is, however, not the best example 
because the plugin itself is rather complex.

*  Chapter 12 of the Learning Apache Drill book explains how to create a format 
plugin. It uses the log format plugin as an example. We've converted the log 
format plugin to use EVF (pull request pending at the moment.)

* We've created an EVF tutorial that shows how to convert the log plugin to use 
EVF. This connects up Chapter 12 of the Drill book with the recent EVF work. [2]


Please use this mailing list to share questions, comments and suggestions as 
you tackle your own plugins. Each plugin has its own unique quirks and issues 
which we can discuss here.


Thanks,
- Paul

[1] https://drill.apache.org/docs/create-or-replace-schema/


[2] 
https://github.com/paul-rogers/drill/wiki/Developer%27s-Guide-to-the-Enhanced-Vector-Framework



Reply via email to