atwam commented on issue #9716: URL: https://github.com/apache/arrow-rs/issues/9716#issuecomment-4245646012
I had a closer look at the arrow-cpp implementation for this. In `GetZeroBasedValueOffsets`, we don't anonicalize empty variable-size offsets on write, we just reuse whatever the in-memory array already has. So if we already have a canonical one-element buffer, we keep it around. If we have a null or zero-length offsets buffer, c++ will preserve that on IPC write. Strikingly, C++ explicitely exercises this permissive case , including on [validation](https://github.com/apache/arrow/blob/4eca50770f7f2c5938a676f0719fbfc8aae4803c/cpp/src/arrow/array/validate.cc#L916). Now the question is whether the spec should be updated (and then other arrow libraries such as polars/arrow2 will have to change), or whether we stick with the spec and output strictly compliant files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
