Re: [PR] [GLUTEN-4889][VL] Support approx_percentile [incubator-gluten]

2024-05-08 Thread via GitHub


WangGuangxin commented on code in PR #5007:
URL: https://github.com/apache/incubator-gluten/pull/5007#discussion_r1593888422


##
gluten-core/src/main/scala/io/glutenproject/extension/columnar/RewriteTypedImperativeAggregate.scala:
##
@@ -67,6 +100,28 @@ object RewriteTypedImperativeAggregate extends 
Rule[SparkPlan] with PullOutProje
 }
 copyBaseAggregateExec(agg)(newResultExpressions = newResultExpressions)
 
+  case agg: BaseAggregateExec
+  if 
agg.aggregateExpressions.exists(shouldRewriteForPercentileLikeExpr) =>
+val exprMap = agg.aggregateExpressions
+  .filter(shouldRewriteForPercentileLikeExpr)
+  .map(ae => ae.aggregateFunction.inputAggBufferAttributes.head -> ae)
+  .toMap
+val newResultExpressions = agg.resultExpressions.map {
+  case attr: AttributeReference =>
+exprMap
+  .get(attr)
+  .map {
+ae =>
+  attr.copy(dataType = 
getPercentileLikeInterminateDataType(ae.aggregateFunction))(
+exprId = attr.exprId,
+qualifier = attr.qualifier
+  )
+  }
+  .getOrElse(attr)
+  case other => other
+}
+copyBaseAggregateExec(agg)(newResultExpressions = newResultExpressions)

Review Comment:
   ok, I'll rebase and refact it 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [GLUTEN-4889][VL] Support approx_percentile [incubator-gluten]

2024-05-07 Thread via GitHub


zhztheplayer commented on code in PR #5007:
URL: https://github.com/apache/incubator-gluten/pull/5007#discussion_r1591954924


##
gluten-core/src/main/scala/io/glutenproject/extension/columnar/RewriteTypedImperativeAggregate.scala:
##
@@ -67,6 +100,28 @@ object RewriteTypedImperativeAggregate extends 
Rule[SparkPlan] with PullOutProje
 }
 copyBaseAggregateExec(agg)(newResultExpressions = newResultExpressions)
 
+  case agg: BaseAggregateExec
+  if 
agg.aggregateExpressions.exists(shouldRewriteForPercentileLikeExpr) =>
+val exprMap = agg.aggregateExpressions
+  .filter(shouldRewriteForPercentileLikeExpr)
+  .map(ae => ae.aggregateFunction.inputAggBufferAttributes.head -> ae)
+  .toMap
+val newResultExpressions = agg.resultExpressions.map {
+  case attr: AttributeReference =>
+exprMap
+  .get(attr)
+  .map {
+ae =>
+  attr.copy(dataType = 
getPercentileLikeInterminateDataType(ae.aggregateFunction))(
+exprId = attr.exprId,
+qualifier = attr.qualifier
+  )
+  }
+  .getOrElse(attr)
+  case other => other
+}
+copyBaseAggregateExec(agg)(newResultExpressions = newResultExpressions)

Review Comment:
   IIUC this could make the offloaded partial aggregate operator incompatible 
with vanilla Spark's final aggregate operator? Say if final aggregation is 
fallen back then UB will be led.
   
   I am considering whether we could use the approach like we had done for 
`approx_count_distinct` (https://github.com/apache/incubator-gluten/pull/1676), 
to replace `approx_percentile` with something like `velox_approx_percentile` 
that has different intermediate type definition (or use an internal project to 
match up with vanilla's intermediate type) at the very early planning phase? 
Then we can distinguish between vanilla's and Velox's implementations for this 
function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [GLUTEN-4889][VL] Support approx_percentile [incubator-gluten]

2024-05-07 Thread via GitHub


zhztheplayer commented on code in PR #5007:
URL: https://github.com/apache/incubator-gluten/pull/5007#discussion_r1591954924


##
gluten-core/src/main/scala/io/glutenproject/extension/columnar/RewriteTypedImperativeAggregate.scala:
##
@@ -67,6 +100,28 @@ object RewriteTypedImperativeAggregate extends 
Rule[SparkPlan] with PullOutProje
 }
 copyBaseAggregateExec(agg)(newResultExpressions = newResultExpressions)
 
+  case agg: BaseAggregateExec
+  if 
agg.aggregateExpressions.exists(shouldRewriteForPercentileLikeExpr) =>
+val exprMap = agg.aggregateExpressions
+  .filter(shouldRewriteForPercentileLikeExpr)
+  .map(ae => ae.aggregateFunction.inputAggBufferAttributes.head -> ae)
+  .toMap
+val newResultExpressions = agg.resultExpressions.map {
+  case attr: AttributeReference =>
+exprMap
+  .get(attr)
+  .map {
+ae =>
+  attr.copy(dataType = 
getPercentileLikeInterminateDataType(ae.aggregateFunction))(
+exprId = attr.exprId,
+qualifier = attr.qualifier
+  )
+  }
+  .getOrElse(attr)
+  case other => other
+}
+copyBaseAggregateExec(agg)(newResultExpressions = newResultExpressions)

Review Comment:
   IIUC this could make the offloaded partial aggregate operator incompatible 
with vanilla Spark's final aggregate operator? Say if final aggregation is 
fallen back then UB will be led.
   
   I am considering whether we could use the approach like we had done for 
`approx_count_distinct`, to replace `approx_percentile` with something like 
`velox_approx_percentile` that has different intermediate type definition at 
the very early planning phase? Then we can distinguish between vanilla's and 
Velox's implementations for this function.



##
gluten-core/src/main/scala/io/glutenproject/extension/columnar/RewriteTypedImperativeAggregate.scala:
##
@@ -67,6 +100,28 @@ object RewriteTypedImperativeAggregate extends 
Rule[SparkPlan] with PullOutProje
 }
 copyBaseAggregateExec(agg)(newResultExpressions = newResultExpressions)
 
+  case agg: BaseAggregateExec
+  if 
agg.aggregateExpressions.exists(shouldRewriteForPercentileLikeExpr) =>
+val exprMap = agg.aggregateExpressions
+  .filter(shouldRewriteForPercentileLikeExpr)
+  .map(ae => ae.aggregateFunction.inputAggBufferAttributes.head -> ae)
+  .toMap
+val newResultExpressions = agg.resultExpressions.map {
+  case attr: AttributeReference =>
+exprMap
+  .get(attr)
+  .map {
+ae =>
+  attr.copy(dataType = 
getPercentileLikeInterminateDataType(ae.aggregateFunction))(
+exprId = attr.exprId,
+qualifier = attr.qualifier
+  )
+  }
+  .getOrElse(attr)
+  case other => other
+}
+copyBaseAggregateExec(agg)(newResultExpressions = newResultExpressions)

Review Comment:
   IIUC this could make the offloaded partial aggregate operator incompatible 
with vanilla Spark's final aggregate operator? Say if final aggregation is 
fallen back then UB will be led.
   
   I am considering whether we could use the approach like we had done for 
`approx_count_distinct` (https://github.com/apache/incubator-gluten/pull/1676), 
to replace `approx_percentile` with something like `velox_approx_percentile` 
that has different intermediate type definition at the very early planning 
phase? Then we can distinguish between vanilla's and Velox's implementations 
for this function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [GLUTEN-4889][VL] Support approx_percentile [incubator-gluten]

2024-05-07 Thread via GitHub


zhztheplayer commented on code in PR #5007:
URL: https://github.com/apache/incubator-gluten/pull/5007#discussion_r1591954924


##
gluten-core/src/main/scala/io/glutenproject/extension/columnar/RewriteTypedImperativeAggregate.scala:
##
@@ -67,6 +100,28 @@ object RewriteTypedImperativeAggregate extends 
Rule[SparkPlan] with PullOutProje
 }
 copyBaseAggregateExec(agg)(newResultExpressions = newResultExpressions)
 
+  case agg: BaseAggregateExec
+  if 
agg.aggregateExpressions.exists(shouldRewriteForPercentileLikeExpr) =>
+val exprMap = agg.aggregateExpressions
+  .filter(shouldRewriteForPercentileLikeExpr)
+  .map(ae => ae.aggregateFunction.inputAggBufferAttributes.head -> ae)
+  .toMap
+val newResultExpressions = agg.resultExpressions.map {
+  case attr: AttributeReference =>
+exprMap
+  .get(attr)
+  .map {
+ae =>
+  attr.copy(dataType = 
getPercentileLikeInterminateDataType(ae.aggregateFunction))(
+exprId = attr.exprId,
+qualifier = attr.qualifier
+  )
+  }
+  .getOrElse(attr)
+  case other => other
+}
+copyBaseAggregateExec(agg)(newResultExpressions = newResultExpressions)

Review Comment:
   IIUC this could make the offloaded partial aggregate operator incompatible 
with vanilla Spark's final aggregate operator? Say if final aggregation is 
fallen back then UB will be led.
   
   I am considering whether we could using the approach like we had done for 
`approx_count_distinct`, to replace `approx_percentile` with something like 
`velox_approx_percentile` that has different intermediate type definition at 
the very early planning phase? Then we can distinguish between vanilla's and 
Velox's implementations for this function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [GLUTEN-4889][VL] Support approx_percentile [incubator-gluten]

2024-04-25 Thread via GitHub


liujiayi771 commented on PR #5007:
URL: 
https://github.com/apache/incubator-gluten/pull/5007#issuecomment-2076680006

   @WangGuangxin Any progress of this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [GLUTEN-4889][VL] Support approx_percentile [incubator-gluten]

2024-03-25 Thread via GitHub


WangGuangxin commented on code in PR #5007:
URL: https://github.com/apache/incubator-gluten/pull/5007#discussion_r1538532110


##
gluten-core/src/main/scala/io/glutenproject/extension/columnar/RewriteTypedImperativeAggregate.scala:
##
@@ -40,6 +41,38 @@ object RewriteTypedImperativeAggregate extends 
Rule[SparkPlan] with PullOutProje
 }
   }
 
+  def shouldRewriteForPercentileLikeExpr(ae: AggregateExpression): Boolean = {
+ae.aggregateFunction match {
+  case _: ApproximatePercentile =>
+ae.mode match {
+  case Partial | PartialMerge => true

Review Comment:
   yeah, it should be removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [GLUTEN-4889][VL] Support approx_percentile [incubator-gluten]

2024-03-23 Thread via GitHub


liujiayi771 commented on code in PR #5007:
URL: https://github.com/apache/incubator-gluten/pull/5007#discussion_r1536728309


##
gluten-core/src/main/scala/io/glutenproject/extension/columnar/RewriteTypedImperativeAggregate.scala:
##
@@ -40,6 +41,38 @@ object RewriteTypedImperativeAggregate extends 
Rule[SparkPlan] with PullOutProje
 }
   }
 
+  def shouldRewriteForPercentileLikeExpr(ae: AggregateExpression): Boolean = {
+ae.aggregateFunction match {
+  case _: ApproximatePercentile =>
+ae.mode match {
+  case Partial | PartialMerge => true

Review Comment:
   It only support `Partial` and `Final` in 
`HashAggregateExecTransformer.scala`. Shall we add `PartialMerge` here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [GLUTEN-4889][VL] Support approx_percentile [incubator-gluten]

2024-03-20 Thread via GitHub


WangGuangxin commented on PR #5007:
URL: 
https://github.com/apache/incubator-gluten/pull/5007#issuecomment-2011045153

   > this
   
   yes, we should merge to upsteam first before merge this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [GLUTEN-4889][VL] Support approx_percentile [incubator-gluten]

2024-03-20 Thread via GitHub


zhztheplayer commented on PR #5007:
URL: 
https://github.com/apache/incubator-gluten/pull/5007#issuecomment-2010990615

   Hi @WangGuangxin so the current plan is to merge 
https://github.com/WangGuangxin/gluten/commit/97db869a56e8d42c0782e4e8218e1937e40682ce
 to upstream Velox before this ? Am I understanding correctly? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [GLUTEN-4889][VL] Support approx_percentile [incubator-gluten]

2024-03-19 Thread via GitHub


liujiayi771 commented on PR #5007:
URL: 
https://github.com/apache/incubator-gluten/pull/5007#issuecomment-2008652742

   @WangGuangxin Perhaps we could create an issue in the upstream Velox 
community to remove this verification.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [GLUTEN-4889][VL] Support approx_percentile [incubator-gluten]

2024-03-19 Thread via GitHub


zhztheplayer commented on code in PR #5007:
URL: https://github.com/apache/incubator-gluten/pull/5007#discussion_r1531488552


##
backends-velox/src/test/scala/io/glutenproject/execution/TestOperator.scala:
##
@@ -1233,4 +1233,12 @@ class TestOperator extends 
VeloxWholeStageTransformerSuite {
   checkOperatorMatch[HashAggregateExecTransformer]
 }
   }
+
+  test("Support ApproximatePercentile") {
+runQueryAndCompare("""
+ |SELECT approx_percentile(col, array(0.5, 0.4, 0.1), 
100)

Review Comment:
   I remember Spark has  `percentile_approx`. Is that one supported by this 
patch either?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [GLUTEN-4889][VL] Support approx_percentile [incubator-gluten]

2024-03-19 Thread via GitHub


zhztheplayer commented on PR #5007:
URL: 
https://github.com/apache/incubator-gluten/pull/5007#issuecomment-2008617781

   Thank you in advance!
   
   > I believe the change on Gluten side is ready now.
   
   Did that mean this PR can be merged prior to merging Velox changes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [GLUTEN-4889][VL] Support approx_percentile [incubator-gluten]

2024-03-19 Thread via GitHub


WangGuangxin commented on PR #5007:
URL: 
https://github.com/apache/incubator-gluten/pull/5007#issuecomment-2008557794

   I believe the change on Gluten side is ready now. But the 
`ApproxPercentileAggregate` implements in velox has to make some minor 
modification, becasue when add intermediate data, it checks the row vector 
encoding in `addIntermediateImpl`
   
https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/ApproxPercentileAggregate.cpp#L652
 ,
   I believe the encoding is guaranteed by `extractAccumulators` 
https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/ApproxPercentileAggregate.cpp#L262.
   
   But when it comes in Gluten, the output of PartialAgg are processed by 
Shuffle, which will flatten all vectors, so that the encoding is not guaranteed 
when added to FinalAgg. 
   
https://github.com/apache/incubator-gluten/blob/main/cpp/velox/shuffle/VeloxShuffleWriter.cc#L343
   
   cc @zhztheplayer @liujiayi771 @ulysses-you 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [GLUTEN-4889][VL] Support approx_percentile [incubator-gluten]

2024-03-19 Thread via GitHub


zhouyuan commented on PR #5007:
URL: 
https://github.com/apache/incubator-gluten/pull/5007#issuecomment-2008478357

   CC: @zhztheplayer 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [GLUTEN-4889][VL] Support approx_percentile [incubator-gluten]

2024-03-18 Thread via GitHub


github-actions[bot] commented on PR #5007:
URL: 
https://github.com/apache/incubator-gluten/pull/5007#issuecomment-2005348332

   Run Gluten Clickhouse CI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [GLUTEN-4889][VL] Support approx_percentile [incubator-gluten]

2024-03-18 Thread via GitHub


github-actions[bot] commented on PR #5007:
URL: 
https://github.com/apache/incubator-gluten/pull/5007#issuecomment-2005347612

   https://github.com/apache/incubator-gluten/issues/4889


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org