scovich commented on issue #7941:
URL: https://github.com/apache/arrow-rs/issues/7941#issuecomment-3085123274

   > I think @scovich is saying that the variant_get kernel (on `VariantArray` 
should have a special case that knows how to look for a shredded sub field -- 
and if for example it is asking for `a` and the the `typed_value.a` column 
exists, variant_get could simply return that `a` column (already as an arrow 
array, no actual Variant manipulation required)
   
   Consider the case where we want to project out just one leaf field of a very 
wide and deep variant value. I think two kinds of things can go wrong if 
pathing is a two-step process of calling `VariantArray::value` followed by 
`Variant::get_path`: 
   1. *Uninteresting shredded fields* -- then we had to pay the cost to create 
(many and deeply-nested) `Variant::ShreddedVariantObject` instances that the 
subsequent pathing call will just ignore. And if we end up calling 
`VariantArray::value` multiple times, we repeat the unnecessary work each time. 
      * This is the case I originally worried about
      * NOTE: This shredded case is different from unshredded variant, where 
all the bytes are there regardless of whether we use them or not: We're stuck 
fetching all unshredded bytes from disk either way, because unshredded variant 
is a row-oriented encoding. 
   2. *Interesting shredded fields* -- Even if we magically/luckily don't pay 
to create other/unrelated variant fields, we still end up (re)creating the 
`Variant::ShreddedVariantObject` for the path of interest. Which is itself 
potentially quite wide and deep and thus a lot of work.
      * I believe this is the case @alamb refers to
      * NOTE: I think there are actually two sub-cases here:
         1. If the caller intends to cast the results to e.g. `Int32Array` (I 
think that's what @alamb was referring to), then it's definitely nice to avoid 
a round trip variant encoding at all. 
         2. If the caller extracted the results as variant, then we could 
actually return a new `VariantArray` with the same `metadata` column, a missing 
`value` column, and an appropriately filtered `typed_value`. Again, this would 
not require any actual variant encoding work, just rearranging physical columns 
within the `VariantArray` itself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to