alamb commented on issue #7941:
URL: https://github.com/apache/arrow-rs/issues/7941#issuecomment-3090533015

    @scovich and I were discussing other options here 
https://github.com/apache/arrow-rs/pull/7915#discussion_r2202981997:
   
   ---
   
   @scovich : 
https://github.com/apache/arrow-rs/pull/7915#discussion_r2202981997
   
   An unshredding operation requires two kinds of variant "building":
   * Shredded fields need a full blown variant builder, because they're 
strongly typed and we need to encode them as variant. But the existing 
`VariantMetadata` already contains all their field names, so the builder only 
needs to look up their field ids. The spec requires the ability to unshred 
using a read-only metadata dictionary, so why go copying it?
   * Unshredded fields can be copied as-is (I would honestly prefer copying raw 
bytes instead of parsing and then unparsing them the way 
https://github.com/apache/arrow-rs/pull/7914 seems to favor). We need a builder 
to keep track of the updated offsets as we interleave existing variant value 
bytes with newly-unshredded value bytes.
   
   Imagine a partially shredded variant column `v`:
   * Fields `a`, `m` and `x` live in the `typed_value` column as a perfectly 
shredded struct
      * Furthermore, `x` is itself a shredded struct with its own fields `i` 
and `j`
         * Furthermore, `i` is a variant column (also using the same top-level 
metadata)
         * Furthermore, `j` is a partially shredded struct
   * Fields `b`, `n`, and `y` live in the `value` column as a variant object
   
   When unshredding, we need to turn `a`, `b`, and `x` from strongly typed 
values into variant objects -- the latter recursively so -- which requires a 
variant builder. The recursive unshredding of `j` also requires a builder of 
its own.
   
   Once we have all the field values in variant form, we need to create a new 
variant object for `v`. We can copy the bytes of `b`, `n` and `y` as a 
contiguous block if we want -- relying on field id sorting to give the required 
ordering `a`, `b`, `m`, `n`, `x`, `y` -- but we do have to copy those bytes in 
order to inject the newly created variant value bytes for `a`, `m`, and `x`. 
Plus, it's entirely possible other newly-unshredded sibling values before and 
after `v` would have required copying its value bytes even if `v` were 
completely unshredded.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to