rluvaton opened a new pull request, #9711: URL: https://github.com/apache/arrow-rs/pull/9711
# Which issue does this PR close? N/A # Rationale for this change In variable-length array types (e.g., `StringArray`, `ListArray`), null entries may have non-empty offset ranges, meaning the underlying data buffer contains data behind nulls. This matters when wanting to work on the underlying values of variable length data for example when unwrapping (flattening) a list array, as the child values are exposed, including those behind null entries. If null entries point to non-empty ranges, the unwrapped values will contain data that may not be meaningful to operate on and could cause errors (e.g., division by zero in the child values). Usages when this will be helpful: - flattening list array - casting lists/map - we don't wanna cast values that are not used so this is a check if there is one - explode on list - we don't want the null values behind it so this give us a check if it exists (will have another pr to cleanup empty values) - gc on lists/map/strings to remove unneeded data # What changes are included in this PR? Add `OffsetBuffer::is_there_null_pointing_to_non_empty_value` method that checks if any null positions correspond to non-empty offset ranges # Are these changes tested? Yes # Are there any user-facing changes? Yes, a new public method `OffsetBuffer::is_there_null_pointing_to_non_empty_value` is added. ------- Related to: - https://github.com/apache/datafusion/pull/18921 as it need to unwrap the list values and only get the reachable values -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
