[GitHub] [spark] yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#discussion_r358607182 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ## @@ -83,32 +83,37 @@ case class ApproximatePercentile( } // Mark as lazy so that accuracyExpression is not evaluated during tree transformation. - private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int] - - override def inputTypes: Seq[AbstractDataType] = { -// Support NumericType, DateType and TimestampType since their internal types are all numeric, -// and can be easily cast to double for processing. -Seq(TypeCollection(NumericType, DateType, TimestampType), - TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType) - } + private lazy val accuracy: Long = accuracyExpression.eval().asInstanceOf[Number].longValue() // Mark as lazy so that percentageExpression is not evaluated during tree transformation. private lazy val (returnPercentileArray: Boolean, percentages: Array[Double]) = -percentageExpression.eval() match { - // Rule ImplicitTypeCasts can cast other numeric types to double - case num: Double => (false, Array(num)) - case arrayData: ArrayData => (true, arrayData.toDoubleArray()) +percentageExpression.dataType match { + case DoubleType => (false, Array(percentageExpression.eval().asInstanceOf[Double])) + case _: NumericType => Review comment: Ok, I' ll check on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#discussion_r358602970 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ## @@ -83,32 +83,37 @@ case class ApproximatePercentile( } // Mark as lazy so that accuracyExpression is not evaluated during tree transformation. - private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int] - - override def inputTypes: Seq[AbstractDataType] = { -// Support NumericType, DateType and TimestampType since their internal types are all numeric, -// and can be easily cast to double for processing. -Seq(TypeCollection(NumericType, DateType, TimestampType), - TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType) - } + private lazy val accuracy: Long = accuracyExpression.eval().asInstanceOf[Number].longValue() // Mark as lazy so that percentageExpression is not evaluated during tree transformation. private lazy val (returnPercentileArray: Boolean, percentages: Array[Double]) = -percentageExpression.eval() match { - // Rule ImplicitTypeCasts can cast other numeric types to double - case num: Double => (false, Array(num)) - case arrayData: ArrayData => (true, arrayData.toDoubleArray()) +percentageExpression.dataType match { + case DoubleType => (false, Array(percentageExpression.eval().asInstanceOf[Double])) + case _: NumericType => Review comment: The type coercion rule does not quite fit this function, in hive this func is implemented without implicit casting, each argument is strictly to its type and range. Our imlementation is not as same as our doc says This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#discussion_r358602970 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ## @@ -83,32 +83,37 @@ case class ApproximatePercentile( } // Mark as lazy so that accuracyExpression is not evaluated during tree transformation. - private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int] - - override def inputTypes: Seq[AbstractDataType] = { -// Support NumericType, DateType and TimestampType since their internal types are all numeric, -// and can be easily cast to double for processing. -Seq(TypeCollection(NumericType, DateType, TimestampType), - TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType) - } + private lazy val accuracy: Long = accuracyExpression.eval().asInstanceOf[Number].longValue() // Mark as lazy so that percentageExpression is not evaluated during tree transformation. private lazy val (returnPercentileArray: Boolean, percentages: Array[Double]) = -percentageExpression.eval() match { - // Rule ImplicitTypeCasts can cast other numeric types to double - case num: Double => (false, Array(num)) - case arrayData: ArrayData => (true, arrayData.toDoubleArray()) +percentageExpression.dataType match { + case DoubleType => (false, Array(percentageExpression.eval().asInstanceOf[Double])) + case _: NumericType => Review comment: The type coercion rule does not quite fit this function, in hive this func is implemented without implicit casting, each argument is strict to its type and range. Our imlementation is not as same as our doc says This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#discussion_r358582153 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ## @@ -83,32 +83,37 @@ case class ApproximatePercentile( } // Mark as lazy so that accuracyExpression is not evaluated during tree transformation. - private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int] - - override def inputTypes: Seq[AbstractDataType] = { -// Support NumericType, DateType and TimestampType since their internal types are all numeric, -// and can be easily cast to double for processing. -Seq(TypeCollection(NumericType, DateType, TimestampType), - TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType) - } + private lazy val accuracy: Long = accuracyExpression.eval().asInstanceOf[Number].longValue() // Mark as lazy so that percentageExpression is not evaluated during tree transformation. private lazy val (returnPercentileArray: Boolean, percentages: Array[Double]) = -percentageExpression.eval() match { - // Rule ImplicitTypeCasts can cast other numeric types to double - case num: Double => (false, Array(num)) - case arrayData: ArrayData => (true, arrayData.toDoubleArray()) +percentageExpression.dataType match { + case DoubleType => (false, Array(percentageExpression.eval().asInstanceOf[Double])) + case _: NumericType => Review comment: I don't mean a overflow exception here, just the value here is not right. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#discussion_r358581552 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ## @@ -83,32 +83,37 @@ case class ApproximatePercentile( } // Mark as lazy so that accuracyExpression is not evaluated during tree transformation. - private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int] - - override def inputTypes: Seq[AbstractDataType] = { -// Support NumericType, DateType and TimestampType since their internal types are all numeric, -// and can be easily cast to double for processing. -Seq(TypeCollection(NumericType, DateType, TimestampType), - TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType) - } + private lazy val accuracy: Long = accuracyExpression.eval().asInstanceOf[Number].longValue() // Mark as lazy so that percentageExpression is not evaluated during tree transformation. private lazy val (returnPercentileArray: Boolean, percentages: Array[Double]) = -percentageExpression.eval() match { - // Rule ImplicitTypeCasts can cast other numeric types to double - case num: Double => (false, Array(num)) - case arrayData: ArrayData => (true, arrayData.toDoubleArray()) +percentageExpression.dataType match { + case DoubleType => (false, Array(percentageExpression.eval().asInstanceOf[Double])) + case _: NumericType => Review comment: If we specify a accuracy value greater than Int.MaxValue This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#discussion_r358579502 ## File path: sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala ## @@ -182,7 +182,7 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession spark.sql( s"""SELECT | key, - | percentile_approx(null, 0.5) + | percentile_approx(cast(null as int), 0.5) Review comment: Agreed. I'll updated this change in the PR description. And I guess the author's original purpose of these tests here are to verify the null handling of the non-null types, not `NullType` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#discussion_r358568323 ## File path: sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala ## @@ -182,7 +182,7 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession spark.sql( s"""SELECT | key, - | percentile_approx(null, 0.5) + | percentile_approx(cast(null as int), 0.5) Review comment: I have checked hive, `percentile_approx(null, 0.5)` is not valid. If it worth to keep back-compatibility of this, I can make NullType acceptable for percentile_approx This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#discussion_r358567641 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ## @@ -83,32 +83,37 @@ case class ApproximatePercentile( } // Mark as lazy so that accuracyExpression is not evaluated during tree transformation. - private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int] - - override def inputTypes: Seq[AbstractDataType] = { -// Support NumericType, DateType and TimestampType since their internal types are all numeric, -// and can be easily cast to double for processing. -Seq(TypeCollection(NumericType, DateType, TimestampType), - TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType) - } + private lazy val accuracy: Long = accuracyExpression.eval().asInstanceOf[Number].longValue() // Mark as lazy so that percentageExpression is not evaluated during tree transformation. private lazy val (returnPercentileArray: Boolean, percentages: Array[Double]) = -percentageExpression.eval() match { - // Rule ImplicitTypeCasts can cast other numeric types to double - case num: Double => (false, Array(num)) - case arrayData: ArrayData => (true, arrayData.toDoubleArray()) +percentageExpression.dataType match { + case DoubleType => (false, Array(percentageExpression.eval().asInstanceOf[Double])) + case _: NumericType => Review comment: here https://github.com/apache/spark/pull/26905/files#diff-7448088085a868386c669c69629c8169L86 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org