rynewang opened a new pull request, #48717: URL: https://github.com/apache/arrow/pull/48717
### Rationale for this change Fixes https://github.com/apache/arrow/issues/47995 When merging ByteArray statistics, empty string min/max values were incorrectly discarded. This happened because `CleanStatistic()` rejected statistics where `ptr == nullptr`, but empty strings can legitimately have `ptr == nullptr` with `len == 0`. ### What changes are included in this PR? Introduces a sentinel pointer (`kNoValueSentinel`) distinct from `nullptr` to mark "no value" in ByteArray statistics. This allows `CleanStatistic` to distinguish between: - "no min/max computed" (sentinel) - "min/max is empty string" (nullptr with len=0) FLBA is unchanged since it has fixed length and no "empty" concept. ### Are these changes tested? Yes. Added comprehensive tests covering all combinations of: - Empty stats (no min/max) - Stats with empty string min ("") - Stats with non-empty min ### Are there any user-facing changes? No API changes. This is a bug fix that preserves empty string statistics correctly during merge operations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
