[GitHub] [iceberg] rdblue commented on a change in pull request #1207: ORC: Support row position as a metadata column

GitBox Wed, 15 Jul 2020 13:34:34 -0700


rdblue commented on a change in pull request #1207:
URL: https://github.com/apache/iceberg/pull/1207#discussion_r455325073




##########
File path: orc/src/main/java/org/apache/iceberg/orc/OrcRowReader.java
##########
@@ -29,6 +29,6 @@
   /**
    * Reads a row.
    */
-  T read(VectorizedRowBatch batch, int row);
+  T read(VectorizedRowBatch batch, long batchOffsetInFile, int 
rowOffsetInBatch);

Review comment:
       This seems to introduce a lot of code churn, when most implementations 
don't use `batchOffsetInFile`. What about a less intrusive way of passing this 
by using a context method that is called once for each batch?
   
   Parquet has something similar, where each row group causes new context to be 
passed to the readers: 
https://github.com/apache/iceberg/blob/master/parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReader.java#L32
   
   This could expose a method like `setBatchContext(long batchOffsetInFile)` 
with a no-op default. Then only a few implementations would need to change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #1207: ORC: Support row position as a metadata column

Reply via email to