scovich commented on issue #7941:
URL: https://github.com/apache/arrow-rs/issues/7941#issuecomment-3085123274
> I think @scovich is saying that the variant_get kernel (on `VariantArray`
should have a special case that knows how to look for a shredded sub field --
and if for example it is asking for `a` and the the `typed_value.a` column
exists, variant_get could simply return that `a` column (already as an arrow
array, no actual Variant manipulation required)
Consider the case where we want to project out just one leaf field of a very
wide and deep variant value. I think two kinds of things can go wrong if
pathing is a two-step process of calling `VariantArray::value` followed by
`Variant::get_path`:
1. *Uninteresting shredded fields* -- then we had to pay the cost to create
(many and deeply-nested) `Variant::ShreddedVariantObject` instances that the
subsequent pathing call will just ignore. And if we end up calling
`VariantArray::value` multiple times, we repeat the unnecessary work each time.
* This is the case I originally worried about
* NOTE: This shredded case is different from unshredded variant, where
all the bytes are there regardless of whether we use them or not: We're stuck
fetching all unshredded bytes from disk either way, because unshredded variant
is a row-oriented encoding.
2. *Interesting shredded fields* -- Even if we magically/luckily don't pay
to create other/unrelated variant fields, we still end up (re)creating the
`Variant::ShreddedVariantObject` for the path of interest. Which is itself
potentially quite wide and deep and thus a lot of work.
* I believe this is the case @alamb refers to
* NOTE: I think there are actually two sub-cases here:
1. If the caller intends to cast the results to e.g. `Int32Array` (I
think that's what @alamb was referring to), then it's definitely nice to avoid
a round trip variant encoding at all.
2. If the caller extracted the results as variant, then we could
actually return a new `VariantArray` with the same `metadata` column, a missing
`value` column, and an appropriately filtered `typed_value`. Again, this would
not require any actual variant encoding work, just rearranging physical columns
within the `VariantArray` itself.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]