wypoon commented on a change in pull request #4395:
URL: https://github.com/apache/iceberg/pull/4395#discussion_r834761565



##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java
##########
@@ -193,15 +200,39 @@ public boolean supportColumnarReads(InputPartition 
partition) {
     }
   }
 
+  static long numFilesToScan(CombinedScanTask scanTask) {
+    long fileCount = 0L;
+    for (FileScanTask file : scanTask.files()) {
+      fileCount += 1L;
+    }
+    return fileCount;
+  }
+
   private static class RowReader extends RowDataReader implements 
PartitionReader<InternalRow> {
+    private long numFilesToRead;
+
     RowReader(ReadTask task) {
       super(task.task, task.table(), task.expectedSchema(), 
task.isCaseSensitive());
+      numFilesToRead = numFilesToScan(task.task);
+    }
+
+    @Override
+    public CustomTaskMetric[] currentMetricsValues() {
+      return new CustomTaskMetric[] { new TaskNumFiles(numFilesToRead) };
     }
   }
 
   private static class BatchReader extends BatchDataReader implements 
PartitionReader<ColumnarBatch> {
+    private long numFilesToRead;
+
     BatchReader(ReadTask task, int batchSize) {
       super(task.task, task.table(), task.expectedSchema(), 
task.isCaseSensitive(), batchSize);
+      numFilesToRead = numFilesToScan(task.task);
+    }
+
+    @Override
+    public CustomTaskMetric[] currentMetricsValues() {
+      return new CustomTaskMetric[] { new TaskNumFiles(numFilesToRead) };
     }

Review comment:
       Note: Spark calls this every 100 rows for each `PartitionReader`. The 
`numFilesToRead` can be computed up front and stored, and is. I wonder if it is 
worth caching the `CustomTaskMetric[]` as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to