rdblue commented on a change in pull request #1207:
URL: https://github.com/apache/iceberg/pull/1207#discussion_r455325073
##########
File path: orc/src/main/java/org/apache/iceberg/orc/OrcRowReader.java
##########
@@ -29,6 +29,6 @@
/**
* Reads a row.
*/
- T read(VectorizedRowBatch batch, int row);
+ T read(VectorizedRowBatch batch, long batchOffsetInFile, int
rowOffsetInBatch);
Review comment:
This seems to introduce a lot of code churn, when most implementations
don't use `batchOffsetInFile`. What about a less intrusive way of passing this
by using a context method that is called once for each batch?
Parquet has something similar, where each row group causes new context to be
passed to the readers:
https://github.com/apache/iceberg/blob/master/parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReader.java#L32
This could expose a method like `setBatchContext(long batchOffsetInFile)`
with a no-op default. Then only a few implementations would need to change.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]