XiangpengHao opened a new pull request, #8887:
URL: https://github.com/apache/arrow-rs/pull/8887
This pr improves the performance of `variant_get` on a perfectly shredded
variant, it bypasses the array builder and directly clone the shredded column.
For example, if a variant looks like this:
```
optional group event (VARIANT) {
required binary metadata;
optional binary value;
optional group typed_value {
required group event_type {
optional binary value; <- this is null
optional binary typed_value (STRING);
}
}
}
```
Then if we read `event_type` and we also want to cast it into a string, then
we don't have to go through the builder but instead directly clone the
`typed_value` array.
Specifically this optimization is safe if:
1. `value` is null (does not exists)
2. `typed_value` has the same data type as the requested data type
I think this is a pretty common case of variant shredding.
====
This PR also has benchmark code. It improves the performance by many many
times (of course 😄).
Let me know what you think! (fwiw, this pr is part of the efforts in
https://github.com/datafusion-contrib/datafusion-variant/issues/19#issuecomment-3554481072)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]