cloud-fan commented on code in PR #41864: URL: https://github.com/apache/spark/pull/41864#discussion_r1259022057
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CountMinSketchAgg.scala: ########## @@ -208,3 +209,20 @@ case class CountMinSketchAgg( confidenceExpression = third, seedExpression = fourth) } + +object CountMinSketchAggExpressionBuilder extends ExpressionBuilder { + final val functionSignature = FunctionSignature(Seq( + NamedArgument("column", + FixedArgumentType(TypeCollection(IntegralType, StringType, BinaryType))), + NamedArgument("epsilon", FixedArgumentType(DoubleType)), + NamedArgument("confidence", FixedArgumentType(DoubleType)), + NamedArgument("seed", FixedArgumentType(IntegerType)) Review Comment: We need to make a decision here. Before this PR, SQL functions define the required input types in the corresponding `Expressions`. Now we want to replace it with a new framework. This is fine but we need to make sure the new framework can fully cover the old one. The old framework is actually quite flexible, you can do arbitrary checks in `Expression.checkInputDataTypes`, or use some convenient trait like `ExpectsInputTypes`. The old framework can also connect to the type coercion framework. For example, the `ImplicitCastInputTypes` trait. The `PercentileBase` expression is an example. It requires the `percentage` input to be foldable, double-type, and within the range [0, 1.0]. Can the new framework cover it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org