alamb commented on issue #7941: URL: https://github.com/apache/arrow-rs/issues/7941#issuecomment-3090533015
@scovich and I were discussing other options here https://github.com/apache/arrow-rs/pull/7915#discussion_r2202981997: --- @scovich : https://github.com/apache/arrow-rs/pull/7915#discussion_r2202981997 An unshredding operation requires two kinds of variant "building": * Shredded fields need a full blown variant builder, because they're strongly typed and we need to encode them as variant. But the existing `VariantMetadata` already contains all their field names, so the builder only needs to look up their field ids. The spec requires the ability to unshred using a read-only metadata dictionary, so why go copying it? * Unshredded fields can be copied as-is (I would honestly prefer copying raw bytes instead of parsing and then unparsing them the way https://github.com/apache/arrow-rs/pull/7914 seems to favor). We need a builder to keep track of the updated offsets as we interleave existing variant value bytes with newly-unshredded value bytes. Imagine a partially shredded variant column `v`: * Fields `a`, `m` and `x` live in the `typed_value` column as a perfectly shredded struct * Furthermore, `x` is itself a shredded struct with its own fields `i` and `j` * Furthermore, `i` is a variant column (also using the same top-level metadata) * Furthermore, `j` is a partially shredded struct * Fields `b`, `n`, and `y` live in the `value` column as a variant object When unshredding, we need to turn `a`, `b`, and `x` from strongly typed values into variant objects -- the latter recursively so -- which requires a variant builder. The recursive unshredding of `j` also requires a builder of its own. Once we have all the field values in variant form, we need to create a new variant object for `v`. We can copy the bytes of `b`, `n` and `y` as a contiguous block if we want -- relying on field id sorting to give the required ordering `a`, `b`, `m`, `n`, `x`, `y` -- but we do have to copy those bytes in order to inject the newly created variant value bytes for `a`, `m`, and `x`. Plus, it's entirely possible other newly-unshredded sibling values before and after `v` would have required copying its value bytes even if `v` were completely unshredded. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org