rdblue commented on code in PR #12298:
URL: https://github.com/apache/iceberg/pull/12298#discussion_r2692435380


##########
parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java:
##########
@@ -1241,6 +1240,50 @@ public ReaderFunction withSchema(Schema expectedSchema) {
       }
     }
 
+    public interface BatchReaderFunction {
+      Function<MessageType, VectorizedReader<?>> apply();
+
+      default BatchReaderFunction withSchema(Schema schema) {
+        return this;
+      }
+    }
+
+    private static class UnaryBatchReaderFunction implements 
BatchReaderFunction {
+      private final Function<MessageType, VectorizedReader<?>> readerFunc;
+
+      UnaryBatchReaderFunction(Function<MessageType, VectorizedReader<?>> 
readerFunc) {
+        this.readerFunc = readerFunc;
+      }
+
+      @Override
+      public Function<MessageType, VectorizedReader<?>> apply() {
+        return readerFunc;
+      }
+    }
+
+    private static class BinaryBatchReaderFunction implements 
BatchReaderFunction {
+      private final BiFunction<Schema, MessageType, VectorizedReader<?>> 
readerFuncWithSchema;
+      private Schema schema;
+
+      BinaryBatchReaderFunction(
+          BiFunction<Schema, MessageType, VectorizedReader<?>> 
readerFuncWithSchema) {
+        this.readerFuncWithSchema = readerFuncWithSchema;
+      }
+
+      @Override
+      public Function<MessageType, VectorizedReader<?>> apply() {
+        Preconditions.checkArgument(
+            schema != null, "Schema must be set for 2-argument reader 
function");

Review Comment:
   I'm not sure about this. This method is called in `build` rather than being 
passed down to where the reader function is actually called (when the 
`MessageType` is known). That means that a 2-argument reader function can only 
be used when the caller provides a projection, otherwise there's an error.
   
   I'd rather update the vectorized read path to convert the Parquet 
`MessageType` to a `Schema` if one is not supplied and then call the 
`BiFunction` to produce a reader. That's more consistent with the builders 
because we defer calling the function until things that are not known until the 
footer is read are available (instead of failing like this).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to