paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-571437864
 
 
   @ihuzenko, refactored some additional steps to adopt the solution you 
suggested. This version still uses the interface, with an implementation in a 
class other than the project record batch. This turns out to be handy because, 
oddly, the project operator generates code for two separate incoming batches: 
the "real" one and a "fake" empty one. The `ProjectRecordBatchBuilder` holds 
onto the input batch so we don't have to pass it into the materializer and back 
out.
   
   This version tries to eliminate all references to the incoming batch in the 
materializer, and instead work with the batch schema. Annoyingly, the 
`ExpressionTreeMaterializer` needs the input batch so it can iterate over the 
vectors to get their schemas. If all we need is the schema, we don't need 
actual vectors. So, if we can pass in a schema, we can completely separate code 
gen from physical vectors.
   
   The next refactoring move is to change this code to work with a schema (or 
interface to obtain the schema) rather than the actual vectors. Now, as it 
turns out, the batch schema has limitations for complex types, which is one of 
the reasons we created the `TupleMetadata` family of classes. So, perhaps we 
can convert the incoming batch to a `TupleMetadata` schema and use that. (The 
code to do that already exists in the `RowSet` classes.)
   
   Or, we can just pass an interface which will return the `TypedFieldId` for 
each column. Or, do that conversion ahead of time, and pass in the results. 
Will have to play with it some to see which solution is the simplest.
   
   Since the required work will be rather large; I propose we do that as a 
separate PR.
   
   Have we done enough for this one PR?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to