debabhishek53 opened a new pull request, #4171:
URL: https://github.com/apache/gobblin/pull/4171
This PR makes Iceberg partition filtering fully configurable and reusable
across all copy flows.
Previously the partition filter was hardcoded to append -00 for hourly
tables and only supported yyyy-MM-dd or yyyy-MM-dd-HH patterns with a fixed
daily lookback. This change generalizes the entire mechanism
### **New configs:**
- iceberg.partition.value.format — accept any DateTimeFormatter pattern
(yyyy-MM-dd-HH, dd-MM-yyyy-HH, yyyyMMdd, etc.) so tables with non-standard
partition naming just work out of the box
- iceberg.partition.hour (0–23) — explicitly control which hour is
embedded in daily partition values instead of always defaulting to 00
- iceberg.lookback.hours — hour-level granularity lookback, naturally
crossing midnight and month boundaries & takes precedence over
iceberg.lookback.days when set
### **New utility: IcebergPartitionFilterGenerator**
Extracted a pure, config-agnostic utility class with three public static
methods:
- forDays(...) — N daily partition values, most-recent first
- forHours(...) — N hourly partition values with natural day/month
boundary crossing
- buildOrExpression(...) — builds an Iceberg OR expression from any
pre-computed value list
This utility is fully decoupled from Gobblin config so it can be called
from any flow, not just the copy pipeline.
**Backward compatible: When iceberg.partition.value.format is absent, the
legacy iceberg.hourly.partition.enabled path runs unchanged.**
### **Test Plan**
- 21 new unit tests in IcebergPartitionFilterGeneratorTest — 100%
instruction/branch/line coverage
- 19 new tests in IcebergSourceTest covering all new config paths: custom
formats, reversed date patterns, hour override, hourly lookback, boundary
crossing
- All tests in gobblin-data-management pass
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]