Samyak2 opened a new pull request, #7919: URL: https://github.com/apache/arrow-rs/pull/7919
# Which issue does this PR close? - Closes #7893 # What changes are included in this PR? Still very early. Opening this PR to get some early feedback on the approach. The approach is roughly like this: - Allocate a new offset array (all zeroes) and a new nulls array (copy of variant's null buffer). - For every variant path access, we iterate through all the values and increment the offsets to the desired object field's offset or array index's offset. - If the value isn't an object/array, we set a null for this row. - We then extract all the values at the new offsets into an array - I have currently only done it for u64. For this PR, I can make it generic for all primitive types. Some open questions: - This seems like a good vectorized approach to me, but it comes at the cost of allocating new buffers for every variant_get invocation. - Would it be worth it to try a row-wise approach instead? It would be something like: do the whole path access for each row and append into the appropriate ArrayBuilder. - This offset-based approach works quite well for extracting complex types too (mainly arrays). I have not implemented it here yet but I have done it elsewhere before. - Databricks has two variations of this function: `variant_get` and `try_variant_get`. - The only difference in `try_variant_get` is that cast errors are ignored. - I'm guessing this is covered by `CastOptions`? I haven't looked at it yet. - Perhaps extracting complex types can be a separate PR? I can do it here, but the PR might become too large to review. I'm new to this codebase, please let me know if I missed anything! :) I will be rebasing this PR once https://github.com/apache/arrow-rs/pull/7905 is merged. I'll be using VariantArray instead of StructArray once that is merged. # Are these changes tested? Not yet If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? I will add them soon # Are there any user-facing changes? Yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org