[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...
Github user yucai commented on the pull request: https://github.com/apache/spark/pull/11212#issuecomment-186077126 @rxin I tried your suggestion like creating PowDecimal for Decimal specially like below: ``` case class PowDecimal(left: Expression, right: Expression) extends BinaryMathExpression(math.pow, "POWER") { override def inputTypes: Seq[AbstractDataType] = Seq(DecimalType, IntegerType) ... } case class Pow(left: Expression, right: Expression) extends BinaryMathExpression(math.pow, "POWER") { ... } ``` But one concern is when "select pow(cast(2 as decimal(5,2)), 3)", how to make the "PowDecimal" node created? The current path will create "Pow" node anyway. So we think of maybe we can still put Decimal processing in "Pow", but byte/short/etc. to integer in type coercion, like below: ``` case class Pow(left: Expression, right: Expression) extends BinaryMathExpression(math.pow, "POWER") { override def inputTypes: Seq[AbstractDataType] = Seq(NumericType, NumericType) override def dataType: DataType = (left.dataType, right.dataType) match { case (dt: DecimalType, ByteType | ShortType | IntegerType) => dt case _ => DoubleType } protected override def nullSafeEval(input1: Any, input2: Any): Any = (left.dataType, right.dataType) match { case (dt: DecimalType, _) => input1.asInstanceOf[Decimal].pow(input2.asInstanceOf[Int]) case _ => math.pow(input1.asInstanceOf[Double], input2.asInstanceOf[Double]) } override def genCode(ctx: CodegenContext, ev: ExprCode): String = ... } In HiveTypeCoercion: ``` object PowCoercion extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions { case e if !e.childrenResolved => e case e @ Pow(left, right) => (left.dataType, right.dataType) match { case (dt: DecimalType, IntegerType) => e case (DoubleType, DoubleType) => e case (dt: DecimalType, ByteType | ShortType) => Pow(left, Cast(right, IntegerType)) case _ => Pow(Cast(left, DoubleType), Cast(right, DoubleType)) } } } ``` How do you think this way? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...
Github user yucai commented on the pull request: https://github.com/apache/spark/pull/11212#issuecomment-185717402 OK, let me try this implementation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/11212#discussion_r53275413 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -170,6 +170,7 @@ public void write(int ordinal, double value) { } public void write(int ordinal, Decimal input, int precision, int scale) { +input = input.clone(); --- End diff -- As Adrian mentioned, we need a copy of input, otherwise `changePrecision` would change the original input. In our case, this means `catalystValue`(expected value) would be changed when `checkEvalutionWithUnsafeProjection` is invoked, and then all tests after checkEvalutionWithUnsafeProjection will fail. ``` protected def checkEvaluation( expression: => Expression, expected: Any, inputRow: InternalRow = EmptyRow): Unit = { val catalystValue = CatalystTypeConverters.convertToCatalyst(expected) checkEvaluationWithoutCodegen(expression, catalystValue, inputRow) checkEvaluationWithGeneratedMutableProjection(expression, catalystValue, inputRow) if (GenerateUnsafeProjection.canSupport(expression.dataType)) { checkEvalutionWithUnsafeProjection(expression, catalystValue, inputRow) } checkEvaluationWithOptimization(expression, catalystValue, inputRow) } ``` Does it make sense? Any suggestion is great helpful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11212#discussion_r53115148 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -170,6 +170,7 @@ public void write(int ordinal, double value) { } public void write(int ordinal, Decimal input, int precision, int scale) { +input = input.clone(); --- End diff -- Here we'll call `changePrecision` on `input` here, which would affect the orignal data. I agree that this is a bad idea, maybe we need to propose a separate pr to work around this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11212#issuecomment-184860431 I think it'd be a lot simpler if we create a separate Pow for Decimal, and handle byte/short/etc to integer in type coercion, rather than in the PowDecimal class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11212#discussion_r53071927 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -170,6 +170,7 @@ public void write(int ordinal, double value) { } public void write(int ordinal, Decimal input, int precision, int scale) { +input = input.clone(); --- End diff -- Why is this necessary? Seems like a really bad idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11212#discussion_r52968764 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -523,11 +523,45 @@ case class Atan2(left: Expression, right: Expression) case class Pow(left: Expression, right: Expression) extends BinaryMathExpression(math.pow, "POWER") { - override def genCode(ctx: CodegenContext, ev: ExprCode): String = { -defineCodeGen(ctx, ev, (c1, c2) => s"java.lang.Math.pow($c1, $c2)") - } -} + override def inputTypes: Seq[AbstractDataType] = Seq(NumericType, NumericType) + + override def dataType: DataType = (left.dataType, right.dataType) match { +case (dt: DecimalType, ByteType | ShortType | IntegerType) => dt +case _ => DoubleType + } + + protected override def nullSafeEval(input1: Any, input2: Any): Any = +(left.dataType, right.dataType) match { + case (dt: DecimalType, ByteType) => +input1.asInstanceOf[Decimal].pow(input2.asInstanceOf[Byte]) + case (dt: DecimalType, ShortType) => +input1.asInstanceOf[Decimal].pow(input2.asInstanceOf[Short]) + case (dt: DecimalType, IntegerType) => +input1.asInstanceOf[Decimal].pow(input2.asInstanceOf[Int]) + case (dt: DecimalType, FloatType) => +math.pow(input1.asInstanceOf[Decimal].toDouble, input2.asInstanceOf[Float]) + case (dt: DecimalType, DoubleType) => +math.pow(input1.asInstanceOf[Decimal].toDouble, input2.asInstanceOf[Double]) + case (dt1: DecimalType, dt2: DecimalType) => +math.pow(input1.asInstanceOf[Decimal].toDouble, input2.asInstanceOf[Decimal].toDouble) --- End diff -- Shall we cast the result of `math.pow` back to `DecimalType` for these three cases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11212#discussion_r52968376 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -170,6 +170,7 @@ public void write(int ordinal, double value) { } public void write(int ordinal, Decimal input, int precision, int scale) { +input = input.clone(); --- End diff -- Better add a comment that explains why we need to clone before write. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11212#discussion_r52968287 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathFunctionsSuite.scala --- @@ -351,6 +350,20 @@ class MathFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper { } test("pow") { +testBinary(Pow, (d: Decimal, n: Byte) => d.pow(n), + (-5 to 5).map(v => (Decimal(v * 1.0), v.toByte))) --- End diff -- maybe `v.toDouble` is better --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org