[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6309: Spark 3.3: Consume arbitrary scans in SparkBatchQueryScan

GitBox Tue, 29 Nov 2022 09:05:59 -0800


aokolnychyi commented on code in PR #6309:
URL: https://github.com/apache/iceberg/pull/6309#discussion_r1035029464



##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java:
##########
@@ -109,40 +110,49 @@ Long snapshotId() {
   private Set<Integer> specIds() {
     if (specIds == null) {
       Set<Integer> specIdSet = Sets.newHashSet();
-      for (FileScanTask file : files()) {
-        specIdSet.add(file.spec().specId());
+      for (PartitionScanTask task : tasks()) {
+        specIdSet.add(task.spec().specId());
       }
       this.specIds = specIdSet;
     }
 
     return specIds;
   }
 
-  private List<FileScanTask> files() {
-    if (files == null) {
-      try (CloseableIterable<FileScanTask> filesIterable = scan.planFiles()) {
-        this.files = Lists.newArrayList(filesIterable);
+  private List<PartitionScanTask> tasks() {
+    if (tasks == null) {
+      try (CloseableIterable<? extends ScanTask> taskIterable = 
scan.planFiles()) {
+        List<PartitionScanTask> partitionScanTasks = Lists.newArrayList();
+        for (ScanTask task : taskIterable) {
+          ValidationException.check(
+              task instanceof PartitionScanTask,

Review Comment:
   `PartitionScanTask` is a very abstract class extended by all tasks right 
now. I can't just use arbitrary tasks as this scan supports runtime filtering 
and needs to know the partition spec and partition of each task.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6309: Spark 3.3: Consume arbitrary scans in SparkBatchQueryScan

Reply via email to