fengguangyuan opened a new issue, #6871:
URL: https://github.com/apache/iceberg/issues/6871

   ### Feature Request / Improvement
   
   # New feature
   Add an extra member variable for `ManifestGroup`, to strengthen Iceberg's 
ability of pruning partitions/files.
   
   # Test case
   Say the high level engine is planning a query  ```select count() from tbl 
where timeToDate(time) = current_date()```, where `time` is a partition key and 
 the function `timeToDate` is a UDF.
   
   # Motivations: Partition pruning failed
   Iceberg can only accept the partition filter transformed as Iceberg 
`Expression` from engine expression.
   
   As the above test case, the UDF can't be translated to Iceberg Expression, 
leading to the failure of partition pruning, `so all the data files of the 
table have to be read. (Worse!!!)`
   
   # Expected actions
   The high level engine can give an arbitrary partition filter to Iceberg to 
prune partitions/data files.
   
   # Discussions
   This issue is sensible and valuable?
   
   # Changes on code
   Please notice the `New Code` below:
   ```java
   class ManifestGroup {
   BiPredicate<PartitionSpec, StructLike> extraPartitionFilter;
   
     public CloseableIterable<FileScanTask> planFiles() {
       return plan(ManifestGroup::createFileScanTasks);
     }
   
     public <T extends ScanTask> CloseableIterable<T> 
plan(CreateTasksFunction<T> createTasksFunc) {
       // ... skips
       Iterable<CloseableIterable<T>> tasks =
           entries(
               (manifest, entries) -> {
                 int specId = manifest.partitionSpecId();
                 TaskContext taskContext = taskContextCache.get(specId);
                 return createTasksFunc.apply(entries, taskContext);
               });
   
       if (executorService != null) {
         return new ParallelIterable<>(tasks, executorService);
       } else {
         return CloseableIterable.concat(tasks);
       }
     }
   
     public CloseableIterable<ManifestEntry<DataFile>> entries() {
       return CloseableIterable.concat(entries((manifest, entries) -> entries));
     }
   
     private <T> Iterable<CloseableIterable<T>> entries(
         BiFunction<ManifestFile, CloseableIterable<ManifestEntry<DataFile>>, 
CloseableIterable<T>>
             entryFn) {
       // ... skips
       return Iterables.transform(
           matchingManifests,
           manifest ->
               new CloseableIterable<T>() {
                 private CloseableIterable<T> iterable;
   
                 @Override
                 public CloseableIterator<T> iterator() {
                   ManifestReader<DataFile> reader =
                       ManifestFiles.read(manifest, io, specsById)
                           .filterRows(dataFilter)
                           .filterPartitions(partitionFilter)
                           .caseSensitive(caseSensitive)
                           .select(columns)
                           .scanMetrics(scanMetrics);
   
                   CloseableIterable<ManifestEntry<DataFile>> entries;
                   // ... skips
                   if (evaluator != null) {
                     entries =
                         CloseableIterable.filter(
                             scanMetrics.skippedDataFiles(),
                             entries,
                             entry -> evaluator.eval((GenericDataFile) 
entry.file()));
                   }
   
                   /****************** New Code ******************/
                   if (extraPartitionFilter != null) {
                     int specId = manifest.partitionSpecId();
                     PartitionSpec spec = specsById.get(specId);
                     entries = CloseableIterable.filter(entries, entry -> 
extraPartitionFilter.test(spec, entry.file().partition()));
                   }
                   /***********************************************/
   
                   entries =
                       CloseableIterable.filter(
                           scanMetrics.skippedDataFiles(), entries, 
manifestEntryPredicate);
   
                   iterable = entryFn.apply(manifest, entries);
   
                   return iterable.iterator();
                 }
   
                 @Override
                 public void close() throws IOException {
                   if (iterable != null) {
                     iterable.close();
                   }
                 }
               });
     }
   
   }
   ```
   
   ### Query engine
   
   Trino


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to