Re: [PR] [GLUTEN-8912][VL] Add Offset support for CollectLimitExec [incubator-gluten]

via GitHub Sun, 16 Mar 2025 10:23:47 -0700


ArnavBalyan commented on code in PR #8914:
URL: https://github.com/apache/incubator-gluten/pull/8914#discussion_r1997651894



##########
backends-velox/src/main/scala/org/apache/gluten/execution/ColumnarCollectLimitExec.scala:
##########
@@ -125,11 +199,20 @@ case class ColumnarCollectLimitExec(
       if (childRDD.getNumPartitions == 1) childRDD
       else shuffleLimitedPartitions(childRDD)
 
-    processedRDD.mapPartitions(partition => collectLimitedRows(partition, 
limit))
+    processedRDD.mapPartitions(
+      partition => {
+        val droppedRows = dropLimitedRows(partition, offset)
+        val adjustedLimit = Math.max(0, limit - offset)
+        collectLimitedRows(droppedRows, adjustedLimit)

Review Comment:
   Yes, however it would not preserve order, since the current implementation 
closely matches Spark, and users may see unexpected ordering and failure across 
UTs. This keeps it similar to Spark implementation and maintains similar order 
as spark thanks



##########
backends-velox/src/test/scala/org/apache/gluten/execution/GlutenSQLCollectLimitExecSuite.scala:
##########
@@ -58,7 +58,7 @@ class GlutenSQLCollectLimitExecSuite extends 
WholeStageTransformerSuite {
 
   testWithSpecifiedSparkVersion(

Review Comment:
   done thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [GLUTEN-8912][VL] Add Offset support for CollectLimitExec [incubator-gluten]

Reply via email to