fengguangyuan opened a new issue, #6871:
URL: https://github.com/apache/iceberg/issues/6871
### Feature Request / Improvement
# New feature
Add an extra member variable for `ManifestGroup`, to strengthen Iceberg's
ability of pruning partitions/files.
# Test case
Say the high level engine is planning a query ```select count() from tbl
where timeToDate(time) = current_date()```, where `time` is a partition key and
the function `timeToDate` is a UDF.
# Motivations: Partition pruning failed
Iceberg can only accept the partition filter transformed as Iceberg
`Expression` from engine expression.
As the above test case, the UDF can't be translated to Iceberg Expression,
leading to the failure of partition pruning, `so all the data files of the
table have to be read. (Worse!!!)`
# Expected actions
The high level engine can give an arbitrary partition filter to Iceberg to
prune partitions/data files.
# Discussions
This issue is sensible and valuable?
# Changes on code
Please notice the `New Code` below:
```java
class ManifestGroup {
BiPredicate<PartitionSpec, StructLike> extraPartitionFilter;
public CloseableIterable<FileScanTask> planFiles() {
return plan(ManifestGroup::createFileScanTasks);
}
public <T extends ScanTask> CloseableIterable<T>
plan(CreateTasksFunction<T> createTasksFunc) {
// ... skips
Iterable<CloseableIterable<T>> tasks =
entries(
(manifest, entries) -> {
int specId = manifest.partitionSpecId();
TaskContext taskContext = taskContextCache.get(specId);
return createTasksFunc.apply(entries, taskContext);
});
if (executorService != null) {
return new ParallelIterable<>(tasks, executorService);
} else {
return CloseableIterable.concat(tasks);
}
}
public CloseableIterable<ManifestEntry<DataFile>> entries() {
return CloseableIterable.concat(entries((manifest, entries) -> entries));
}
private <T> Iterable<CloseableIterable<T>> entries(
BiFunction<ManifestFile, CloseableIterable<ManifestEntry<DataFile>>,
CloseableIterable<T>>
entryFn) {
// ... skips
return Iterables.transform(
matchingManifests,
manifest ->
new CloseableIterable<T>() {
private CloseableIterable<T> iterable;
@Override
public CloseableIterator<T> iterator() {
ManifestReader<DataFile> reader =
ManifestFiles.read(manifest, io, specsById)
.filterRows(dataFilter)
.filterPartitions(partitionFilter)
.caseSensitive(caseSensitive)
.select(columns)
.scanMetrics(scanMetrics);
CloseableIterable<ManifestEntry<DataFile>> entries;
// ... skips
if (evaluator != null) {
entries =
CloseableIterable.filter(
scanMetrics.skippedDataFiles(),
entries,
entry -> evaluator.eval((GenericDataFile)
entry.file()));
}
/****************** New Code ******************/
if (extraPartitionFilter != null) {
int specId = manifest.partitionSpecId();
PartitionSpec spec = specsById.get(specId);
entries = CloseableIterable.filter(entries, entry ->
extraPartitionFilter.test(spec, entry.file().partition()));
}
/***********************************************/
entries =
CloseableIterable.filter(
scanMetrics.skippedDataFiles(), entries,
manifestEntryPredicate);
iterable = entryFn.apply(manifest, entries);
return iterable.iterator();
}
@Override
public void close() throws IOException {
if (iterable != null) {
iterable.close();
}
}
});
}
}
```
### Query engine
Trino
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]