paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator URL: https://github.com/apache/drill/pull/1944#issuecomment-571437864 @ihuzenko, refactored some additional steps to adopt the solution you suggested. This version still uses the interface, with an implementation in a class other than the project record batch. This turns out to be handy because, oddly, the project operator generates code for two separate incoming batches: the "real" one and a "fake" empty one. The `ProjectRecordBatchBuilder` holds onto the input batch so we don't have to pass it into the materializer and back out. This version tries to eliminate all references to the incoming batch in the materializer, and instead work with the batch schema. Annoyingly, the `ExpressionTreeMaterializer` needs the input batch so it can iterate over the vectors to get their schemas. If all we need is the schema, we don't need actual vectors. So, if we can pass in a schema, we can completely separate code gen from physical vectors. The next refactoring move is to change this code to work with a schema (or interface to obtain the schema) rather than the actual vectors. Now, as it turns out, the batch schema has limitations for complex types, which is one of the reasons we created the `TupleMetadata` family of classes. So, perhaps we can convert the incoming batch to a `TupleMetadata` schema and use that. (The code to do that already exists in the `RowSet` classes.) Or, we can just pass an interface which will return the `TypedFieldId` for each column. Or, do that conversion ahead of time, and pass in the results. Will have to play with it some to see which solution is the simplest. Since the required work will be rather large; I propose we do that as a separate PR. Have we done enough for this one PR?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
