[GitHub] [spark] cloud-fan commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
cloud-fan commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#discussion_r358606175 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ## @@ -83,32 +83,37 @@ case class ApproximatePercentile( } // Mark as lazy so that accuracyExpression is not evaluated during tree transformation. - private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int] - - override def inputTypes: Seq[AbstractDataType] = { -// Support NumericType, DateType and TimestampType since their internal types are all numeric, -// and can be easily cast to double for processing. -Seq(TypeCollection(NumericType, DateType, TimestampType), - TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType) - } + private lazy val accuracy: Long = accuracyExpression.eval().asInstanceOf[Number].longValue() // Mark as lazy so that percentageExpression is not evaluated during tree transformation. private lazy val (returnPercentileArray: Boolean, percentages: Array[Double]) = -percentageExpression.eval() match { - // Rule ImplicitTypeCasts can cast other numeric types to double - case num: Double => (false, Array(num)) - case arrayData: ArrayData => (true, arrayData.toDoubleArray()) +percentageExpression.dataType match { + case DoubleType => (false, Array(percentageExpression.eval().asInstanceOf[Double])) + case _: NumericType => Review comment: We can do custom type coercion (don't extend ImplicitCastInputTypes) and disallow casting long to int implicitly for this function. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
cloud-fan commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#discussion_r358590272 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ## @@ -83,32 +83,37 @@ case class ApproximatePercentile( } // Mark as lazy so that accuracyExpression is not evaluated during tree transformation. - private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int] - - override def inputTypes: Seq[AbstractDataType] = { -// Support NumericType, DateType and TimestampType since their internal types are all numeric, -// and can be easily cast to double for processing. -Seq(TypeCollection(NumericType, DateType, TimestampType), - TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType) - } + private lazy val accuracy: Long = accuracyExpression.eval().asInstanceOf[Number].longValue() // Mark as lazy so that percentageExpression is not evaluated during tree transformation. private lazy val (returnPercentileArray: Boolean, percentages: Array[Double]) = -percentageExpression.eval() match { - // Rule ImplicitTypeCasts can cast other numeric types to double - case num: Double => (false, Array(num)) - case arrayData: ArrayData => (true, arrayData.toDoubleArray()) +percentageExpression.dataType match { + case DoubleType => (false, Array(percentageExpression.eval().asInstanceOf[Double])) + case _: NumericType => Review comment: shall we simply update `inputTypes` to replace `IntegerType` with `LongType`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
cloud-fan commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#discussion_r358580862 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ## @@ -83,32 +83,37 @@ case class ApproximatePercentile( } // Mark as lazy so that accuracyExpression is not evaluated during tree transformation. - private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int] - - override def inputTypes: Seq[AbstractDataType] = { -// Support NumericType, DateType and TimestampType since their internal types are all numeric, -// and can be easily cast to double for processing. -Seq(TypeCollection(NumericType, DateType, TimestampType), - TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType) - } + private lazy val accuracy: Long = accuracyExpression.eval().asInstanceOf[Number].longValue() // Mark as lazy so that percentageExpression is not evaluated during tree transformation. private lazy val (returnPercentileArray: Boolean, percentages: Array[Double]) = -percentageExpression.eval() match { - // Rule ImplicitTypeCasts can cast other numeric types to double - case num: Double => (false, Array(num)) - case arrayData: ArrayData => (true, arrayData.toDoubleArray()) +percentageExpression.dataType match { + case DoubleType => (false, Array(percentageExpression.eval().asInstanceOf[Double])) + case _: NumericType => Review comment: Why would it overflow? The type coercion should cast `accuracyExpression` to int type. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org