On second thought, I think there's a semantic issue with removing upper_bound for partition outputs so I don't think I'd like that even though it does reduce metadata footprint.
I think storing only lower_bound for partition outputs means we need a special rule: "a file is partitioned if the output lower_bound is set and upper_bound is null". This constrains the model and changes the semantics of stats specifically for partition outputs. Keeping both lower and upper bound preserves consistent statistic semantics and doesn't assume that stats on transform outputs necessarily mean the file is partitioned. For example, a file could have hour(ts) values representing the range (just a representation, not the actual integer values) [2026-05-03 at 10 PM, 2026-05-03-14 at 12 AM], representing clustering on hour transforms without strict partitioning. With only lower_bound, we'd have to treat any transform output stats as indicating partitioning, which may be overly constraining. Thanks, Amogh Jahagirdar >
