[GitHub] paul-rogers opened a new pull request #1618: DRILL-6950: Row set-based scan framework

GitBox Mon, 21 Jan 2019 18:15:51 -0800

paul-rogers opened a new pull request #1618: DRILL-6950: Row set-based scan 
framework
URL: https://github.com/apache/drill/pull/1618
 
 
   Adds the "plumbing" that connects the scan operator to the result set loader 
and the scan projection framework. See the various package-info.java files for 
the technical details.
   
   The broad idea is that a (file) reader does three things:
   
   * Decides if it can provide a schema up-front (early schema), or if it must 
discover the schema as the read progresses (late schema).
   * If a schema is available up-front, the reader provides that schema.
   * The reader then uses a result set loader to read rows into columns, 
optionally creating new columns (late schema) as the read progresses.
   
   The scan framework handles all the details that were formerly done by the 
reader:
   
   * Decide how to project the columns found by the reader into the set 
required by the query.
   * Decide when to stop reading a batch (because of a memory limit or a row 
limit).
   * Fill in "implicit" file metadata columns.
   * Fill in null columns for missing columns.
   
   Previous PRs provided the underlying mechanisms. This PR provides the "glue" 
and "plumbing" that connects the reader, the scan operator and the framework 
mechanisms. A key goal was to minimize "collateral damage" changes to other 
operators. Although this patch introduces a new structure for the scan operator 
and readers, the design ensures that this new mechanism can work alongside the 
"legacy" scanner operator and record readers. A later patch will include the 
final glue that retrofits the "Easy" scan framework to support the new 
mechanisms.
   
   This PR does not introduce any actual readers: the work here is plenty 
large. Readers will come later. One unfortunate side-effect is that the current 
PR can seem a bit abstract without the ability to connect it to an actual 
reader. Please refer to my private "RowSetRev4" branch if you want a preview of 
how the readers work.
   
   Finally, this PR includes a large number of unit tests that validate all of 
the new mechanisms.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] paul-rogers opened a new pull request #1618: DRILL-6950: Row set-based scan framework

Reply via email to