zhli1142015 commented on PR #12097:
URL: https://github.com/apache/gluten/pull/12097#issuecomment-4461025365

   > @zhli1142015 I still think the "fast path" is a duplication of the 
`VeloxResizeBatchesExec` as they are basically doing the same thing, but the 
operator can handle all types of BaseVector.
   > 
   > If the goal is to have the shuffle reader produces larger output, can we 
simply follow the native implementation in `VeloxResizeBatchesExec` to use 
Velox api to handle all types of vectors, including complex datatype and 
dictionary encoding?
   
     We are using reader-side raw payload merge mainly because it has lower 
cost than VeloxResizeBatchesExec: it merges plain hash shuffle payload buffers 
before Velox vectors are materialized, so it avoids the generic RowVector 
append/resizing overhead for this case. I think it is better to treat this as a 
fast path rather than a replacement for VeloxResizeBatchesExec.
   
      For completeness, users can still enable VeloxResizeBatchesExec 
separately to cover the generic cases that this raw-payload fast path 
intentionally does not handle, such as complex types or dictionary-encoded 
payloads. I added a dedicated config for this fast path, defaulting to false, 
so users can choose whether to enable reader-side raw payload merge, 
VeloxResizeBatchesExec, or both depending on their workload.
      Does this sound ok to you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to