Github user tanejagagan commented on a diff in the pull request: https://github.com/apache/spark/pull/16497#discussion_r95305820 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala --- @@ -81,7 +96,11 @@ case class Percentile( case arrayData: ArrayData => arrayData.toDoubleArray().toSeq } - override def children: Seq[Expression] = child :: percentageExpression :: Nil + override def children: Seq[Expression] = if (withFrqExpr) { --- End diff -- I have given lot of thought to it and if we need to make a difference here. Lets take a data set of age_string, age, count "20", 20, 1 "15", 15, 1 "10", 10, 1 For Sql "select percentile( age, count , 0.5 ) from table" logically correct values should be children = age::count ::0.5 :: Nil and inputType = IntegerType :: IntegerType::DoubleType::Nil For sql "select pecentile( age, 0.5 ) from table" logically correct values should be children = age::0.5 :: Nil and inputType = IntegerType ::DoubleType::Nil Here is one example where keeping it logically correct would help For following incorrect SQL "select percentile( age, '10') from table" With children = age::'10'::Nil and inputType = IntergerType::StringType:: Nil Since both children and inputType is used for dataType validation, the error message would be correct as below. "argument 2 requires Double type, however, 10 is of String type." However With children = age::Literal(1)::'10'::Nil and inputType = IntergerType::IntegerType::StringType:: Nil The error message would be NOT correct and confusing as below "argument 3 requires Double type, however, 10 is of String type." Since both children and dataType are public method i was inclined to keep them explicitly correct and therefore i decided to make a difference. Please let me know your thoughts
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org