scovich commented on PR #8299: URL: https://github.com/apache/arrow-rs/pull/8299#issuecomment-3272348634
Re > My biggest comment / suggestion is to consider making the API vectorized (convert the entire Arrow Array) but I think we can do that as a follow on PR And https://github.com/apache/arrow-rs/pull/8299#discussion_r2334767295 -- that run-end encoding could be handled more easily in a vectorized API. And https://github.com/apache/arrow-rs/pull/8299#discussion_r2334753988 that suggests an `append_all_rows()` method. And https://github.com/apache/arrow-rs/pull/8299#discussion_r2334742498 that also wonders about vectorization. I'll try to give one response that covers them all: I think it's reasonable to consider adding a bulk append type API, but we have to be cognizant of the limitations and challenges it will face: * We will need a new trait that knows how to create (and finish!) variant builder instances * Variant building is inherently row-based, so any builder that ultimately needs to produce a variant array or variant object as its output will have a trivial `append_all_rows` that just calls `append_row` in a loop (like today), in order to recursively build up the fields/elements of the variant it creates. * The API would be very nice for converting primitive arrays to variant, because they don't need to recurse on anything. Also nice because we could potentially define a specialized impl just for `VariantArrayBuilder`, so we don't have to deal with that new variant builder create+finish trait. * Casting a list of primitive values is an interesting intermediate case, where one _should_ be able to append all the elements of a given list in one shot. But that _might_ require the new create+finish trait? Or maybe it just needs a second specialization for `ListBuilder`? * Maybe instead of a no-arg `append_all_rows()`, we should consider a ranged `append_many_rows(start..end)`? One could always pass `..` to request encoding of all rows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
