rdblue commented on code in PR #12298:
URL: https://github.com/apache/iceberg/pull/12298#discussion_r2692461709
##########
parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java:
##########
@@ -1442,11 +1498,13 @@ public <D> CloseableIterable<D> build() {
}
if (batchedReaderFunc != null) {
+ Function<MessageType, VectorizedReader<?>> readBuilder =
+ batchedReaderFunc.withSchema(schema).apply();
Review Comment:
I don't see the benefit of adding `BatchReaderFunction` and the two
implementations that aren't exposed (although the interface is public).
Instead, this could track the `BiFunction` and then pass a new `readerFunc`
created here:
```java
Function<MessageType, VectorizedReader<?>> readerFunc =
messageType -> batchedReaderFunc.apply(schema, messageType);
```
This is a temporary fix, though. Because of the `Precondition` I pointed out
above, I think the final solution is to ensure that a valid schema is always
passed. That means that this should pass the `BiFunction` into
`VectorizedParquetReader` instead of just the one that takes `MessageType`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]