[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6345: Spark 3.3: Choose readers based on task types

GitBox Thu, 01 Dec 2022 16:41:24 -0800


aokolnychyi commented on code in PR #6345:
URL: https://github.com/apache/iceberg/pull/6345#discussion_r1037687769



##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/BatchDataReader.java:
##########
@@ -28,21 +28,48 @@
 import org.apache.iceberg.io.CloseableIterator;
 import org.apache.iceberg.io.InputFile;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.spark.source.metrics.TaskNumDeletes;
+import org.apache.iceberg.spark.source.metrics.TaskNumSplits;
 import org.apache.spark.rdd.InputFileBlockHolder;
+import org.apache.spark.sql.connector.metric.CustomTaskMetric;
+import org.apache.spark.sql.connector.read.PartitionReader;
 import org.apache.spark.sql.vectorized.ColumnarBatch;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-class BatchDataReader extends BaseBatchReader<FileScanTask> {
+class BatchDataReader extends BaseBatchReader<FileScanTask>

Review Comment:
   This class was only used as `PartitionReader` in `SparkScan`, where we 
extended it, mixed `PartitionReader` and called the implementation as 
`BatchReader`.  After adding a common reader factory, we may have multiple 
batch readers now. That's why `BatchDataReader` seemed like a more accurate 
name than `BatchReader`. As there were no other places that used this class, I 
decided to implement `PartitionReader` directly here.
   
   Any feedback is welcome. See `SparkScan` below for old usage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6345: Spark 3.3: Choose readers based on task types

Reply via email to