zhli1142015 commented on PR #12097:
URL: https://github.com/apache/gluten/pull/12097#issuecomment-4461025365
> @zhli1142015 I still think the "fast path" is a duplication of the
`VeloxResizeBatchesExec` as they are basically doing the same thing, but the
operator can handle all types of BaseVector.
>
> If the goal is to have the shuffle reader produces larger output, can we
simply follow the native implementation in `VeloxResizeBatchesExec` to use
Velox api to handle all types of vectors, including complex datatype and
dictionary encoding?
We are using reader-side raw payload merge mainly because it has lower
cost than VeloxResizeBatchesExec: it merges plain hash shuffle payload buffers
before Velox vectors are materialized, so it avoids the generic RowVector
append/resizing overhead for this case. I think it is better to treat this as a
fast path rather than a replacement for VeloxResizeBatchesExec.
For completeness, users can still enable VeloxResizeBatchesExec
separately to cover the generic cases that this raw-payload fast path
intentionally does not handle, such as complex types or dictionary-encoded
payloads. I added a dedicated config for this fast path, defaulting to false,
so users can choose whether to enable reader-side raw payload merge,
VeloxResizeBatchesExec, or both depending on their workload.
Does this sound ok to you?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]