[GitHub] paul-rogers commented on issue #1618: DRILL-6950: Row set-based scan framework

GitBox Sat, 23 Feb 2019 16:11:42 -0800

paul-rogers commented on issue #1618: DRILL-6950: Row set-based scan framework
URL: https://github.com/apache/drill/pull/1618#issuecomment-466716112
 
 
   Seems we have a number of ambiguous issues to resolve around implicit 
columns:
   
   * Their type.
   * Their scope.
   
   Type: The CSV reader (and any reader that uses ScanBatch) defines file 
metadata (AKA implicit) columns as Nullable VARCHAR. The new framework defines 
them as required VARCHAR. One can argue that the required mode is a) more 
correct, and b) more efficient.
   
   Scope: The CSV reader today does not define the partition columns for a 
class-path reader. Their type is nullable INT if used (because they are 
undefined.) The revised framework always has the partition columns available if 
the file metadata columns are available. The columns are of type Nullable 
VARCHAR (since, if a partition does not exist, it is set to null.)
   
   My recommendation is to a) ensure we have plenty of tests of current 
behavior, and b) go with the new behavior as we switch to the new readers. We 
should, however, debate this suggestion to ensure everyone agrees.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] paul-rogers commented on issue #1618: DRILL-6950: Row set-based scan framework

Reply via email to