scovich commented on issue #7941: URL: https://github.com/apache/arrow-rs/issues/7941#issuecomment-3085123274
> I think @scovich is saying that the variant_get kernel (on `VariantArray` should have a special case that knows how to look for a shredded sub field -- and if for example it is asking for `a` and the the `typed_value.a` column exists, variant_get could simply return that `a` column (already as an arrow array, no actual Variant manipulation required) Consider the case where we want to project out just one leaf field of a very wide and deep variant value. I think two kinds of things can go wrong if pathing is a two-step process of calling `VariantArray::value` followed by `Variant::get_path`: 1. *Uninteresting shredded fields* -- then we had to pay the cost to create (many and deeply-nested) `Variant::ShreddedVariantObject` instances that the subsequent pathing call will just ignore. And if we end up calling `VariantArray::value` multiple times, we repeat the unnecessary work each time. * This is the case I originally worried about * NOTE: This shredded case is different from unshredded variant, where all the bytes are there regardless of whether we use them or not: We're stuck fetching all unshredded bytes from disk either way, because unshredded variant is a row-oriented encoding. 2. *Interesting shredded fields* -- Even if we magically/luckily don't pay to create other/unrelated variant fields, we still end up (re)creating the `Variant::ShreddedVariantObject` for the path of interest. Which is itself potentially quite wide and deep and thus a lot of work. * I believe this is the case @alamb refers to * NOTE: I think there are actually two sub-cases here: 1. If the caller intends to cast the results to e.g. `Int32Array` (I think that's what @alamb was referring to), then it's definitely nice to avoid a round trip variant encoding at all. 2. If the caller extracted the results as variant, then we could actually return a new `VariantArray` with the same `metadata` column, a missing `value` column, and an appropriately filtered `typed_value`. Again, this would not require any actual variant encoding work, just rearranging physical columns within the `VariantArray` itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org