Samyak2 opened a new pull request, #7919:
URL: https://github.com/apache/arrow-rs/pull/7919

   # Which issue does this PR close?
   
   - Closes #7893
   
   # What changes are included in this PR?
   
   Still very early. Opening this PR to get some early feedback on the approach.
   
   The approach is roughly like this:
   - Allocate a new offset array (all zeroes) and a new nulls array (copy of 
variant's null buffer).
   - For every variant path access, we iterate through all the values and 
increment the offsets to the desired object field's offset or array index's 
offset.
       - If the value isn't an object/array, we set a null for this row.
   - We then extract all the values at the new offsets into an array
       - I have currently only done it for u64. For this PR, I can make it 
generic for all primitive types.
   
   Some open questions:
   - This seems like a good vectorized approach to me, but it comes at the cost 
of allocating new buffers for every variant_get invocation.
       - Would it be worth it to try a row-wise approach instead? It would be 
something like: do the whole path access for each row and append into the 
appropriate ArrayBuilder.
       - This offset-based approach works quite well for extracting complex 
types too (mainly arrays). I have not implemented it here yet but I have done 
it elsewhere before.
   - Databricks has two variations of this function: `variant_get` and 
`try_variant_get`.
       - The only difference in `try_variant_get` is that cast errors are 
ignored.
       - I'm guessing this is covered by `CastOptions`? I haven't looked at it 
yet.
   - Perhaps extracting complex types can be a separate PR? I can do it here, 
but the PR might become too large to review.
   
   I'm new to this codebase, please let me know if I missed anything! :)
   
   I will be rebasing this PR once https://github.com/apache/arrow-rs/pull/7905 
is merged. I'll be using VariantArray instead of StructArray once that is 
merged.
   
   # Are these changes tested?
   
   Not yet
   
   If tests are not included in your PR, please explain why (for example, are 
they covered by existing tests)?
   
   I will add them soon
   
   # Are there any user-facing changes?
   
   Yes
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to