ArnavBalyan commented on code in PR #8914:
URL: https://github.com/apache/incubator-gluten/pull/8914#discussion_r1997651894
##########
backends-velox/src/main/scala/org/apache/gluten/execution/ColumnarCollectLimitExec.scala:
##########
@@ -125,11 +199,20 @@ case class ColumnarCollectLimitExec(
if (childRDD.getNumPartitions == 1) childRDD
else shuffleLimitedPartitions(childRDD)
- processedRDD.mapPartitions(partition => collectLimitedRows(partition,
limit))
+ processedRDD.mapPartitions(
+ partition => {
+ val droppedRows = dropLimitedRows(partition, offset)
+ val adjustedLimit = Math.max(0, limit - offset)
+ collectLimitedRows(droppedRows, adjustedLimit)
Review Comment:
Yes, however it would not preserve order, since the current implementation
closely matches Spark, and users may see unexpected ordering and failure across
UTs. This keeps it similar to Spark implementation and maintains similar order
as spark thanks
##########
backends-velox/src/test/scala/org/apache/gluten/execution/GlutenSQLCollectLimitExecSuite.scala:
##########
@@ -58,7 +58,7 @@ class GlutenSQLCollectLimitExecSuite extends
WholeStageTransformerSuite {
testWithSpecifiedSparkVersion(
Review Comment:
done thanks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]