[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69669841 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,59 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = Math.ceil((children.length - 1.0) / numRows).toInt + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") --- End diff -- Ah. Now I understand the meaning. Sure, someday later. Maybe, the pattern is popular, so we can fix all of the error message together in a single PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69669599 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,59 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = Math.ceil((children.length - 1.0) / numRows).toInt + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") --- End diff -- Sorry I merged this PR before see your comments. Yea including the number of args makes the error message more friendly, but not a big deal, @dongjoon-hyun you can fix it in your next PR by the way --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14033 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69668562 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,59 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = Math.ceil((children.length - 1.0) / numRows).toInt + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (children.head.dataType != IntegerType || !children.head.foldable || numRows < 1) { + TypeCheckResult.TypeCheckFailure("The number of rows must be a positive constant integer.") --- End diff -- ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69668537 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,59 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = Math.ceil((children.length - 1.0) / numRows).toInt + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") --- End diff -- What do you mean? Could you give some example what you want? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69667347 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,59 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = Math.ceil((children.length - 1.0) / numRows).toInt + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (children.head.dataType != IntegerType || !children.head.foldable || numRows < 1) { + TypeCheckResult.TypeCheckFailure("The number of rows must be a positive constant integer.") --- End diff -- include the value of `numRows` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69667329 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,59 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = Math.ceil((children.length - 1.0) / numRows).toInt + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") --- End diff -- Can you also include the number of args passed and the args ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69650449 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,59 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> --- End diff -- Oops. Thank you, @tejasapatil ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69648741 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,59 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> --- End diff -- nit: double `)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69611144 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratorExpressionSuite.scala --- @@ -63,4 +63,16 @@ class GeneratorExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { ))), correct_answer) } + + test("stack") { +checkTuple(Stack(Seq(1).map(Literal(_))), Seq(create_row())) +checkTuple(Stack(Seq(1, 1).map(Literal(_))), Seq(create_row(1))) +checkTuple(Stack(Seq(1, 1, 2).map(Literal(_))), Seq(create_row(1, 2))) +checkTuple(Stack(Seq(2, 1, 2).map(Literal(_))), Seq(create_row(1), create_row(2))) +checkTuple(Stack(Seq(2, 1, 2, 3).map(Literal(_))), Seq(create_row(1, 2), create_row(3, null))) + +checkTuple( + Stack(Seq(2, "a", "b", "c").map(Literal(_))), + Seq(create_row("a", "b"), create_row("c", null))) --- End diff -- Thank you for the pointer. Nice! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69609981 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratorExpressionSuite.scala --- @@ -63,4 +63,16 @@ class GeneratorExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { ))), correct_answer) } + + test("stack") { +checkTuple(Stack(Seq(1).map(Literal(_))), Seq(create_row())) +checkTuple(Stack(Seq(1, 1).map(Literal(_))), Seq(create_row(1))) +checkTuple(Stack(Seq(1, 1, 2).map(Literal(_))), Seq(create_row(1, 2))) +checkTuple(Stack(Seq(2, 1, 2).map(Literal(_))), Seq(create_row(1), create_row(2))) +checkTuple(Stack(Seq(2, 1, 2, 3).map(Literal(_))), Seq(create_row(1, 2), create_row(3, null))) + +checkTuple( + Stack(Seq(2, "a", "b", "c").map(Literal(_))), + Seq(create_row("a", "b"), create_row("c", null))) --- End diff -- you can follow this one: https://github.com/apache/spark/commit/85f2303ecadd9bf6d9694a2743dda075654c5ccf#diff-e4663e57952b37150642b33b998715a8R94 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69608652 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratorExpressionSuite.scala --- @@ -63,4 +63,16 @@ class GeneratorExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { ))), correct_answer) } + + test("stack") { +checkTuple(Stack(Seq(1).map(Literal(_))), Seq(create_row())) +checkTuple(Stack(Seq(1, 1).map(Literal(_))), Seq(create_row(1))) +checkTuple(Stack(Seq(1, 1, 2).map(Literal(_))), Seq(create_row(1, 2))) +checkTuple(Stack(Seq(2, 1, 2).map(Literal(_))), Seq(create_row(1), create_row(2))) +checkTuple(Stack(Seq(2, 1, 2, 3).map(Literal(_))), Seq(create_row(1, 2), create_row(3, null))) + +checkTuple( + Stack(Seq(2, "a", "b", "c").map(Literal(_))), + Seq(create_row("a", "b"), create_row("c", null))) --- End diff -- Oh, we cannot test that here since it's Expression-level testing. I remembered the reason why I add `try-catch` before. ``` + checkTuple(Stack(Seq(1.0).map(Literal(_))), Seq(create_row())) ... java.lang.Double cannot be cast to java.lang.Integer java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Integer ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69607588 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratorExpressionSuite.scala --- @@ -63,4 +63,16 @@ class GeneratorExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { ))), correct_answer) } + + test("stack") { +checkTuple(Stack(Seq(1).map(Literal(_))), Seq(create_row())) +checkTuple(Stack(Seq(1, 1).map(Literal(_))), Seq(create_row(1))) +checkTuple(Stack(Seq(1, 1, 2).map(Literal(_))), Seq(create_row(1, 2))) +checkTuple(Stack(Seq(2, 1, 2).map(Literal(_))), Seq(create_row(1), create_row(2))) +checkTuple(Stack(Seq(2, 1, 2, 3).map(Literal(_))), Seq(create_row(1, 2), create_row(3, null))) + +checkTuple( + Stack(Seq(2, "a", "b", "c").map(Literal(_))), + Seq(create_row("a", "b"), create_row("c", null))) --- End diff -- Sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69607552 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,59 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = Math.ceil((children.length - 1.0) / numRows).toInt + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (children.head.dataType != IntegerType || !children.head.foldable || numRows < 1) { + TypeCheckResult.TypeCheckFailure("The number of rows must be a positive constant integer.") +} else { + for (i <- 1 until children.length) { +val j = (i - 1) % numFields +if (children(i).dataType != elementSchema.fields(j).dataType) { + return TypeCheckResult.TypeCheckFailure( +s"Argument ${j + 1} (${elementSchema.fields(j).dataType}) != " + + s"Argument $i (${children(i).dataType})") +} + } + TypeCheckResult.TypeCheckSuccess +} + } + + override def elementSchema: StructType = +StructType(children.tail.take(numFields).zipWithIndex.map { + case (e, index) => StructField(s"col$index", e.dataType) +}) + + override def eval(input: InternalRow): TraversableOnce[InternalRow] = { +val values = children.tail.map(_.eval(input)) --- End diff -- Oh, right! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69606570 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratorExpressionSuite.scala --- @@ -63,4 +63,16 @@ class GeneratorExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { ))), correct_answer) } + + test("stack") { +checkTuple(Stack(Seq(1).map(Literal(_))), Seq(create_row())) +checkTuple(Stack(Seq(1, 1).map(Literal(_))), Seq(create_row(1))) +checkTuple(Stack(Seq(1, 1, 2).map(Literal(_))), Seq(create_row(1, 2))) +checkTuple(Stack(Seq(2, 1, 2).map(Literal(_))), Seq(create_row(1), create_row(2))) +checkTuple(Stack(Seq(2, 1, 2, 3).map(Literal(_))), Seq(create_row(1, 2), create_row(3, null))) + +checkTuple( + Stack(Seq(2, "a", "b", "c").map(Literal(_))), + Seq(create_row("a", "b"), create_row("c", null))) --- End diff -- also add some test cases for type checking failure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69606332 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,59 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = Math.ceil((children.length - 1.0) / numRows).toInt + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (children.head.dataType != IntegerType || !children.head.foldable || numRows < 1) { + TypeCheckResult.TypeCheckFailure("The number of rows must be a positive constant integer.") +} else { + for (i <- 1 until children.length) { +val j = (i - 1) % numFields +if (children(i).dataType != elementSchema.fields(j).dataType) { + return TypeCheckResult.TypeCheckFailure( +s"Argument ${j + 1} (${elementSchema.fields(j).dataType}) != " + + s"Argument $i (${children(i).dataType})") +} + } + TypeCheckResult.TypeCheckSuccess +} + } + + override def elementSchema: StructType = +StructType(children.tail.take(numFields).zipWithIndex.map { + case (e, index) => StructField(s"col$index", e.dataType) +}) + + override def eval(input: InternalRow): TraversableOnce[InternalRow] = { +val values = children.tail.map(_.eval(input)) --- End diff -- It's better to call `toArray` here, as we will access it by index in a loop --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69596328 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,63 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = ((children.length - 1) + numRows - 1) / numRows + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (children.head.dataType != IntegerType || !children.head.foldable || numRows < 1) { + TypeCheckResult.TypeCheckFailure("The number of rows must be a positive constant integer.") +} else { + for (i <- 1 until children.length) { +val j = (i - 1) % numFields +if (children(i).dataType != elementSchema.fields(j).dataType) { + return TypeCheckResult.TypeCheckFailure( +s"Argument ${j + 1} (${elementSchema.fields(j).dataType}) != " + + s"Argument $i (${children(i).dataType})") +} + } + TypeCheckResult.TypeCheckSuccess +} + } + + override def elementSchema: StructType = { +var schema = new StructType() +val types = children.tail.take(numFields).map(_.dataType) +for (i <- 0 until numFields) { + schema = schema.add(s"col$i", types(i)) +} +schema + } + + override def eval(input: InternalRow): TraversableOnce[InternalRow] = { +val values = children.tail.map(_.eval(input)) +for (row <- 0 until numRows) yield { + val fields = new Array[Any](numFields) + for (col <- 0 until numFields) { +val index = (row % numRows) * numFields + col --- End diff -- Absolutely, it's unnecessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69596016 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,63 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = ((children.length - 1) + numRows - 1) / numRows + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (children.head.dataType != IntegerType || !children.head.foldable || numRows < 1) { + TypeCheckResult.TypeCheckFailure("The number of rows must be a positive constant integer.") +} else { + for (i <- 1 until children.length) { +val j = (i - 1) % numFields +if (children(i).dataType != elementSchema.fields(j).dataType) { + return TypeCheckResult.TypeCheckFailure( +s"Argument ${j + 1} (${elementSchema.fields(j).dataType}) != " + + s"Argument $i (${children(i).dataType})") +} + } + TypeCheckResult.TypeCheckSuccess +} + } + + override def elementSchema: StructType = { +var schema = new StructType() --- End diff -- Oops, you already taught this in another PR! Sorry for my laziness. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69595623 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,63 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = ((children.length - 1) + numRows - 1) / numRows --- End diff -- Sure. I will use `math.ceil` clearly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69510808 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,63 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = ((children.length - 1) + numRows - 1) / numRows + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (children.head.dataType != IntegerType || !children.head.foldable || numRows < 1) { + TypeCheckResult.TypeCheckFailure("The number of rows must be a positive constant integer.") +} else { + for (i <- 1 until children.length) { +val j = (i - 1) % numFields +if (children(i).dataType != elementSchema.fields(j).dataType) { + return TypeCheckResult.TypeCheckFailure( +s"Argument ${j + 1} (${elementSchema.fields(j).dataType}) != " + + s"Argument $i (${children(i).dataType})") +} + } + TypeCheckResult.TypeCheckSuccess +} + } + + override def elementSchema: StructType = { +var schema = new StructType() +val types = children.tail.take(numFields).map(_.dataType) +for (i <- 0 until numFields) { + schema = schema.add(s"col$i", types(i)) +} +schema + } + + override def eval(input: InternalRow): TraversableOnce[InternalRow] = { +val values = children.tail.map(_.eval(input)) +for (row <- 0 until numRows) yield { + val fields = new Array[Any](numFields) + for (col <- 0 until numFields) { +val index = (row % numRows) * numFields + col --- End diff -- `row <- 0 until numRows`, looks like it's unnecessary to do `row % numRows` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69510671 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,63 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = ((children.length - 1) + numRows - 1) / numRows + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (children.head.dataType != IntegerType || !children.head.foldable || numRows < 1) { + TypeCheckResult.TypeCheckFailure("The number of rows must be a positive constant integer.") +} else { + for (i <- 1 until children.length) { +val j = (i - 1) % numFields +if (children(i).dataType != elementSchema.fields(j).dataType) { + return TypeCheckResult.TypeCheckFailure( +s"Argument ${j + 1} (${elementSchema.fields(j).dataType}) != " + + s"Argument $i (${children(i).dataType})") +} + } + TypeCheckResult.TypeCheckSuccess +} + } + + override def elementSchema: StructType = { +var schema = new StructType() --- End diff -- how about ``` StructType(children.tail.take(numFields).zipWithIndex.map { case (e, index) => StructField(s"col$index", e.dataType) }) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69510162 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,63 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = ((children.length - 1) + numRows - 1) / numRows --- End diff -- can you explain a bit more about this? It will be good if we can expression the logic more clear using `math.ceil` or something. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69503280 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,70 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = try { +children.head.eval().asInstanceOf[Int] + } catch { +case _: ClassCastException => + throw new AnalysisException("The number of rows must be a positive constant integer.") + } + + private lazy val numFields = ((children.length - 1) + numRows - 1) / numRows + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (children.head.dataType != IntegerType || !children.head.foldable || + children.head.eval().asInstanceOf[Int] < 1) { + TypeCheckResult.TypeCheckFailure("The number of rows must be a positive constant integer.") +} else { + for (i <- 1 until children.length) { +val j = (i - 1) % numFields +if (children(i).dataType != elementSchema.fields(j).dataType) { + return TypeCheckResult.TypeCheckFailure( +s"Argument ${j + 1} (${elementSchema.fields(j).dataType}) != " + --- End diff -- I see. Thank you for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69503069 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,70 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = try { +children.head.eval().asInstanceOf[Int] + } catch { +case _: ClassCastException => --- End diff -- Oh, indeed. Without that, all test passes. `elementSchema` is not called before. During developing, I thought I found a case for that. But, I must be confused at some mixed cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69498156 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,70 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = try { +children.head.eval().asInstanceOf[Int] + } catch { +case _: ClassCastException => + throw new AnalysisException("The number of rows must be a positive constant integer.") + } + + private lazy val numFields = ((children.length - 1) + numRows - 1) / numRows + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (children.head.dataType != IntegerType || !children.head.foldable || + children.head.eval().asInstanceOf[Int] < 1) { + TypeCheckResult.TypeCheckFailure("The number of rows must be a positive constant integer.") +} else { + for (i <- 1 until children.length) { +val j = (i - 1) % numFields +if (children(i).dataType != elementSchema.fields(j).dataType) { + return TypeCheckResult.TypeCheckFailure( +s"Argument ${j + 1} (${elementSchema.fields(j).dataType}) != " + --- End diff -- not a big deal, `Argument i` is also fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69498130 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,70 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = try { +children.head.eval().asInstanceOf[Int] + } catch { +case _: ClassCastException => --- End diff -- `elementSchema` is a method, where do we call it before the type checking? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69493271 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,70 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = try { +children.head.eval().asInstanceOf[Int] + } catch { +case _: ClassCastException => + throw new AnalysisException("The number of rows must be a positive constant integer.") + } + + private lazy val numFields = ((children.length - 1) + numRows - 1) / numRows + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (children.head.dataType != IntegerType || !children.head.foldable || + children.head.eval().asInstanceOf[Int] < 1) { + TypeCheckResult.TypeCheckFailure("The number of rows must be a positive constant integer.") +} else { + for (i <- 1 until children.length) { +val j = (i - 1) % numFields +if (children(i).dataType != elementSchema.fields(j).dataType) { + return TypeCheckResult.TypeCheckFailure( +s"Argument ${j + 1} (${elementSchema.fields(j).dataType}) != " + --- End diff -- Should I replace to `${i}th argument`? There is no problem to change like that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69481100 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,70 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = try { +children.head.eval().asInstanceOf[Int] + } catch { +case _: ClassCastException => + throw new AnalysisException("The number of rows must be a positive constant integer.") + } + + private lazy val numFields = ((children.length - 1) + numRows - 1) / numRows + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (children.head.dataType != IntegerType || !children.head.foldable || + children.head.eval().asInstanceOf[Int] < 1) { --- End diff -- Sure! I'll fix soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69481018 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,70 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = try { +children.head.eval().asInstanceOf[Int] + } catch { +case _: ClassCastException => + throw new AnalysisException("The number of rows must be a positive constant integer.") + } + + private lazy val numFields = ((children.length - 1) + numRows - 1) / numRows + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (children.head.dataType != IntegerType || !children.head.foldable || + children.head.eval().asInstanceOf[Int] < 1) { + TypeCheckResult.TypeCheckFailure("The number of rows must be a positive constant integer.") +} else { + for (i <- 1 until children.length) { +val j = (i - 1) % numFields +if (children(i).dataType != elementSchema.fields(j).dataType) { + return TypeCheckResult.TypeCheckFailure( +s"Argument ${j + 1} (${elementSchema.fields(j).dataType}) != " + --- End diff -- I wasn't sure for 1st and 2nd. So, I borrowed `Argument i` from Hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69480927 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,70 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = try { +children.head.eval().asInstanceOf[Int] + } catch { +case _: ClassCastException => --- End diff -- This is needed since `numRows` and `numFields` is used in `elementSchema` first before `checkInputDataTypes`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69466832 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,70 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = try { +children.head.eval().asInstanceOf[Int] + } catch { +case _: ClassCastException => + throw new AnalysisException("The number of rows must be a positive constant integer.") + } + + private lazy val numFields = ((children.length - 1) + numRows - 1) / numRows + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (children.head.dataType != IntegerType || !children.head.foldable || + children.head.eval().asInstanceOf[Int] < 1) { --- End diff -- `numRows < 1` looks simplier --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69466914 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,70 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = try { +children.head.eval().asInstanceOf[Int] + } catch { +case _: ClassCastException => + throw new AnalysisException("The number of rows must be a positive constant integer.") + } + + private lazy val numFields = ((children.length - 1) + numRows - 1) / numRows + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (children.head.dataType != IntegerType || !children.head.foldable || + children.head.eval().asInstanceOf[Int] < 1) { + TypeCheckResult.TypeCheckFailure("The number of rows must be a positive constant integer.") +} else { + for (i <- 1 until children.length) { +val j = (i - 1) % numFields +if (children(i).dataType != elementSchema.fields(j).dataType) { + return TypeCheckResult.TypeCheckFailure( +s"Argument ${j + 1} (${elementSchema.fields(j).dataType}) != " + --- End diff -- It's better to say `xth argument` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69466495 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,70 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with CodegenFallback { + + private lazy val numRows = try { +children.head.eval().asInstanceOf[Int] + } catch { +case _: ClassCastException => --- End diff -- I don't think we need to try-catch here, it's guaranteed to be int after the type checking. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69429057 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,62 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) --- End diff -- Anyway, sorry for this. I completely misunderstood the behavior of this function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69428617 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,62 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) --- End diff -- Interesting. Here is the result from Hive. I'll fix this PR like Hive. ``` hive> select stack(2, 2.0, 3, 4, 5.0); FAILED: UDFArgumentException Argument 1's type (double) should be equal to argument 3's type (int) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69428234 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,62 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) --- End diff -- Oh. I overlooked that. Yes, right. It should be applied for the columns. Hmm. Let me check and investigate more. Thank you for pointing out that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69425803 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +94,62 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) --- End diff -- Can you check what's the type coercion rule for hive? It looks to me that the values for same column should be same type, but not all values should be same type. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407701 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- Oh, I see now what you mean! Correctly, I missed that. I'll add the logic and testcase. Thank you again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407665 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- we should throw `AnalysisException` instead of `ClassCastException`, the type checking is not working here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407641 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- ``` scala> sql("select stack(1.0,2,3)"); java.lang.ClassCastException: org.apache.spark.sql.types.Decimal cannot be cast to java.lang.Integer ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407647 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -17,6 +17,8 @@ package org.apache.spark.sql.catalyst.expressions +import scala.collection.mutable.ArrayBuffer --- End diff -- Oops. My bad. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407596 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- Should I modified the description, `the first data type rules`, more clearly? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407567 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- Oh, there is misleading comment. The first argument `1` is the number of row. Its type is checked by type-checker. The type of first argument of data, `1.0`, rules the followings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407515 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -17,6 +17,8 @@ package org.apache.spark.sql.catalyst.expressions +import scala.collection.mutable.ArrayBuffer --- End diff -- unnecessary import? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407491 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- e.g. what if the first argument is not int type? and I'm also surprised that `stack(1, 1.0, 2)` works, we will cast `1.0` to int type, according to the definition of `inputTypes` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407409 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (!children.head.foldable || children.head.eval().asInstanceOf[Int] < 1) { + TypeCheckResult.TypeCheckFailure("The number of rows must be positive constant.") +} else if (children.tail.map(_.dataType).distinct.count(_ != NullType) > 1) { + TypeCheckResult.TypeCheckFailure( +s"The expressions should all have the same type," + + s" but got $prettyName(${children.map(_.dataType)}).") +} else { + TypeCheckResult.TypeCheckSuccess +} + } + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = ((children.length - 1) + numRows - 1) / numRows + + override def elementSchema: StructType = { +var schema = new StructType() +for (i <- 0 until numFields) { + schema = schema.add(s"col$i", children(1).dataType) +} +schema + } + + override def eval(input: InternalRow): TraversableOnce[InternalRow] = { +val values = children.tail.map(_.eval(input)) +for (row <- 0 until numRows) yield { + val fields = ArrayBuffer.empty[Any] --- End diff -- Right, Good catch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407065 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- Thank you for review again, @cloud-fan . For this, I added type casting tests here. https://github.com/apache/spark/pull/14033/files#diff-a2587541e08bf6e23df33738488d070aR30 Did I miss something there? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69406936 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (!children.head.foldable || children.head.eval().asInstanceOf[Int] < 1) { + TypeCheckResult.TypeCheckFailure("The number of rows must be positive constant.") +} else if (children.tail.map(_.dataType).distinct.count(_ != NullType) > 1) { + TypeCheckResult.TypeCheckFailure( +s"The expressions should all have the same type," + + s" but got $prettyName(${children.map(_.dataType)}).") +} else { + TypeCheckResult.TypeCheckSuccess +} + } + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = ((children.length - 1) + numRows - 1) / numRows + + override def elementSchema: StructType = { +var schema = new StructType() +for (i <- 0 until numFields) { + schema = schema.add(s"col$i", children(1).dataType) +} +schema + } + + override def eval(input: InternalRow): TraversableOnce[InternalRow] = { +val values = children.tail.map(_.eval(input)) +for (row <- 0 until numRows) yield { + val fields = ArrayBuffer.empty[Any] --- End diff -- why use `ArrayBuffer` here? The number of columns is already known right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69406896 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- As we override `checkInputDataTypes` here, the `ImplicitCastInputTypes` is useless now. We need to take care of all type check logic in `checkInputDataTypes` ourselves. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/14033 [SPARK-16286][SQL] Implement stack table generating function ## What changes were proposed in this pull request? This PR implements `stack` table generating function. ## How was this patch tested? Pass the Jenkins tests including new testcases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-16286 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14033.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14033 commit 6de93a1582ac5877a932ea47e86811e228b5c2f6 Author: Dongjoon HyunDate: 2016-07-03T05:18:16Z [SPARK-16286][SQL] Implement stack table generating function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org