debabhishek53 opened a new pull request, #4171:
URL: https://github.com/apache/gobblin/pull/4171

   This PR makes Iceberg partition filtering fully configurable and reusable 
across all copy flows.
   
   Previously the partition filter was hardcoded to append -00 for hourly 
tables and only supported yyyy-MM-dd or yyyy-MM-dd-HH patterns with a fixed 
daily lookback. This change generalizes the entire mechanism
   
   ###   **New configs:**
     - iceberg.partition.value.format — accept any DateTimeFormatter pattern 
(yyyy-MM-dd-HH, dd-MM-yyyy-HH, yyyyMMdd, etc.) so tables with non-standard 
partition naming just work out of the box
     - iceberg.partition.hour (0–23) — explicitly control which hour is 
embedded in daily partition values instead of always defaulting to 00
     - iceberg.lookback.hours — hour-level granularity lookback, naturally 
crossing midnight and month boundaries & takes precedence over 
iceberg.lookback.days when set
   
   ###   **New utility: IcebergPartitionFilterGenerator**
   
     Extracted a pure, config-agnostic utility class with three public static 
methods:
     - forDays(...) — N daily partition values, most-recent first
     - forHours(...) — N hourly partition values with natural day/month 
boundary crossing
     - buildOrExpression(...) — builds an Iceberg OR expression from any 
pre-computed value list
   
     This utility is fully decoupled from Gobblin config so it can be called 
from any flow, not just the copy pipeline.
   
     **Backward compatible: When iceberg.partition.value.format is absent, the 
legacy iceberg.hourly.partition.enabled path runs unchanged.**
   
   ###   **Test Plan**
   
     - 21 new unit tests in IcebergPartitionFilterGeneratorTest — 100% 
instruction/branch/line coverage
     - 19 new tests in IcebergSourceTest covering all new config paths: custom 
formats, reversed date patterns, hour override, hourly lookback, boundary 
crossing
     - All tests in gobblin-data-management pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to