baibaichen opened a new issue, #12187:
URL: https://github.com/apache/gluten/issues/12187

   ### Description
   
   Spark's `AttachDistributedSequenceExec` prepends a contiguous, globally 
increasing `Long` id column to its child output. It is used by 
pandas-on-Spark's `distributed-sequence` default index and by 
`DataFrame.zipWithIndex`. Today Gluten falls back to vanilla Spark for this 
operator, which forces a columnar → row transition that dominates runtime for 
wide / nested-typed inputs.
   
   This issue tracks porting the operator to the Velox backend so that it runs 
end-to-end on columnar batches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to