andygrove commented on code in PR #4542:
URL: https://github.com/apache/datafusion-comet/pull/4542#discussion_r3494405941


##########
spark/src/main/scala/org/apache/comet/serde/aggregates.scala:
##########
@@ -592,6 +592,79 @@ object CometStddevPop extends 
CometAggregateExpressionSerde[StddevPop] with Come
   }
 }
 
+object CometPercentile extends CometAggregateExpressionSerde[Percentile] {
+
+  private val arrayOfPercentagesReason = "An array of percentages is not 
supported."
+  private val nonLiteralPercentageReason = "The percentage argument must be a 
literal."
+  private val frequencyReason = "A frequency argument is not supported."
+  // `reverse` is set when `percentile_cont`/`percentile_disc` is used with
+  // `WITHIN GROUP (ORDER BY ... DESC)` on Spark 4.0+. The native 
`percentile_cont` always
+  // interpolates in ascending order, so the descending form would return a 
wrong answer.
+  private val descendingReason =
+    "Descending order in `WITHIN GROUP (ORDER BY ... DESC)` is not supported."
+  private val inputTypeReason = "Only numeric input types are supported."
+
+  override def getUnsupportedReasons(): Seq[String] = Seq(
+    arrayOfPercentagesReason,
+    nonLiteralPercentageReason,
+    frequencyReason,
+    descendingReason,
+    inputTypeReason)
+
+  override def getSupportLevel(expr: Percentile): SupportLevel = {
+    // Only the single-percentage, default-frequency, numeric-input, ascending 
form is wired
+    // today. It maps to DataFusion's percentile_cont, which uses the same 
`index = p * (n - 1)`
+    // linear interpolation as Spark's exact Percentile. Array-of-percentages, 
a non-default
+    // frequency argument, descending order, and interval inputs fall back to 
Spark.
+    if (expr.percentageExpression.dataType != DoubleType) {
+      return Unsupported(Some(arrayOfPercentagesReason))
+    }
+    if (!expr.percentageExpression.foldable) {
+      return Unsupported(Some(nonLiteralPercentageReason))
+    }
+    expr.frequencyExpression match {
+      case Literal(1L, _) =>
+      case _ => return Unsupported(Some(frequencyReason))
+    }
+    if (expr.reverse) {
+      return Unsupported(Some(descendingReason))
+    }
+    expr.child.dataType match {
+      case _: NumericType => Compatible(None)

Review Comment:
   I have now updated this to be `Incompatible` and opt-in until the follow on 
issue https://github.com/apache/datafusion-comet/issues/4719 is addressed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to