spark git commit: [SPARK-16289][SQL] Implement posexplode table generating function
Repository: spark Updated Branches: refs/heads/branch-2.0 7ef1d1c61 -> a04975457 [SPARK-16289][SQL] Implement posexplode table generating function This PR implements `posexplode` table generating function. Currently, master branch raises the following exception for `map` argument. It's different from Hive. **Before** ```scala scala> sql("select posexplode(map('a', 1, 'b', 2))").show org.apache.spark.sql.AnalysisException: No handler for Hive UDF ... posexplode() takes an array as a parameter; line 1 pos 7 ``` **After** ```scala scala> sql("select posexplode(map('a', 1, 'b', 2))").show +---+---+-+ |pos|key|value| +---+---+-+ | 0| a|1| | 1| b|2| +---+---+-+ ``` For `array` argument, `after` is the same with `before`. ``` scala> sql("select posexplode(array(1, 2, 3))").show +---+---+ |pos|col| +---+---+ | 0| 1| | 1| 2| | 2| 3| +---+---+ ``` Pass the Jenkins tests with newly added testcases. Author: Dongjoon HyunCloses #13971 from dongjoon-hyun/SPARK-16289. (cherry picked from commit 46395db80e3304e3f3a1ebdc8aadb8f2819b48b4) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a0497545 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a0497545 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a0497545 Branch: refs/heads/branch-2.0 Commit: a049754577aa78a5a26b38821233861a4dfd8e8a Parents: 7ef1d1c Author: Dongjoon Hyun Authored: Thu Jun 30 12:03:54 2016 -0700 Committer: Reynold Xin Committed: Thu Jul 7 21:05:31 2016 -0700 -- R/pkg/NAMESPACE | 1 + R/pkg/R/functions.R | 17 R/pkg/R/generics.R | 4 + R/pkg/inst/tests/testthat/test_sparkSQL.R | 2 +- python/pyspark/sql/functions.py | 21 + .../catalyst/analysis/FunctionRegistry.scala| 1 + .../sql/catalyst/expressions/generators.scala | 66 +++--- .../analysis/ExpressionTypeCheckingSuite.scala | 2 + .../expressions/GeneratorExpressionSuite.scala | 71 +++ .../scala/org/apache/spark/sql/Column.scala | 1 + .../scala/org/apache/spark/sql/functions.scala | 8 ++ .../spark/sql/ColumnExpressionSuite.scala | 60 - .../spark/sql/GeneratorFunctionSuite.scala | 92 .../spark/sql/hive/HiveSessionCatalog.scala | 2 +- 14 files changed, 276 insertions(+), 72 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a0497545/R/pkg/NAMESPACE -- diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE index 9fd2568..bc3aceb 100644 --- a/R/pkg/NAMESPACE +++ b/R/pkg/NAMESPACE @@ -235,6 +235,7 @@ exportMethods("%in%", "over", "percent_rank", "pmod", + "posexplode", "quarter", "rand", "randn", http://git-wip-us.apache.org/repos/asf/spark/blob/a0497545/R/pkg/R/functions.R -- diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R index 09e5afa..52d46f9 100644 --- a/R/pkg/R/functions.R +++ b/R/pkg/R/functions.R @@ -2934,3 +2934,20 @@ setMethod("sort_array", jc <- callJStatic("org.apache.spark.sql.functions", "sort_array", x@jc, asc) column(jc) }) + +#' posexplode +#' +#' Creates a new row for each element with position in the given array or map column. +#' +#' @rdname posexplode +#' @name posexplode +#' @family collection_funcs +#' @export +#' @examples \dontrun{posexplode(df$c)} +#' @note posexplode since 2.1.0 +setMethod("posexplode", + signature(x = "Column"), + function(x) { +jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) +column(jc) + }) http://git-wip-us.apache.org/repos/asf/spark/blob/a0497545/R/pkg/R/generics.R -- diff --git a/R/pkg/R/generics.R b/R/pkg/R/generics.R index b0f25de..e4ec508 100644 --- a/R/pkg/R/generics.R +++ b/R/pkg/R/generics.R @@ -1054,6 +1054,10 @@ setGeneric("percent_rank", function(x) { standardGeneric("percent_rank") }) #' @export setGeneric("pmod", function(y, x) { standardGeneric("pmod") }) +#' @rdname posexplode +#' @export +setGeneric("posexplode", function(x) { standardGeneric("posexplode") }) + #' @rdname quarter #' @export setGeneric("quarter", function(x) { standardGeneric("quarter") }) http://git-wip-us.apache.org/repos/asf/spark/blob/a0497545/R/pkg/inst/tests/testthat/test_sparkSQL.R
spark git commit: [SPARK-16289][SQL] Implement posexplode table generating function
Repository: spark Updated Branches: refs/heads/master fdf9f94f8 -> 46395db80 [SPARK-16289][SQL] Implement posexplode table generating function ## What changes were proposed in this pull request? This PR implements `posexplode` table generating function. Currently, master branch raises the following exception for `map` argument. It's different from Hive. **Before** ```scala scala> sql("select posexplode(map('a', 1, 'b', 2))").show org.apache.spark.sql.AnalysisException: No handler for Hive UDF ... posexplode() takes an array as a parameter; line 1 pos 7 ``` **After** ```scala scala> sql("select posexplode(map('a', 1, 'b', 2))").show +---+---+-+ |pos|key|value| +---+---+-+ | 0| a|1| | 1| b|2| +---+---+-+ ``` For `array` argument, `after` is the same with `before`. ``` scala> sql("select posexplode(array(1, 2, 3))").show +---+---+ |pos|col| +---+---+ | 0| 1| | 1| 2| | 2| 3| +---+---+ ``` ## How was this patch tested? Pass the Jenkins tests with newly added testcases. Author: Dongjoon HyunCloses #13971 from dongjoon-hyun/SPARK-16289. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/46395db8 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/46395db8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/46395db8 Branch: refs/heads/master Commit: 46395db80e3304e3f3a1ebdc8aadb8f2819b48b4 Parents: fdf9f94 Author: Dongjoon Hyun Authored: Thu Jun 30 12:03:54 2016 -0700 Committer: Reynold Xin Committed: Thu Jun 30 12:03:54 2016 -0700 -- R/pkg/NAMESPACE | 1 + R/pkg/R/functions.R | 17 R/pkg/R/generics.R | 4 + R/pkg/inst/tests/testthat/test_sparkSQL.R | 2 +- python/pyspark/sql/functions.py | 21 + .../catalyst/analysis/FunctionRegistry.scala| 1 + .../sql/catalyst/expressions/generators.scala | 66 +++--- .../analysis/ExpressionTypeCheckingSuite.scala | 2 + .../expressions/GeneratorExpressionSuite.scala | 71 +++ .../scala/org/apache/spark/sql/Column.scala | 1 + .../scala/org/apache/spark/sql/functions.scala | 8 ++ .../spark/sql/ColumnExpressionSuite.scala | 60 - .../spark/sql/GeneratorFunctionSuite.scala | 92 .../spark/sql/hive/HiveSessionCatalog.scala | 2 +- 14 files changed, 276 insertions(+), 72 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/46395db8/R/pkg/NAMESPACE -- diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE index e0ffde9..abc6588 100644 --- a/R/pkg/NAMESPACE +++ b/R/pkg/NAMESPACE @@ -234,6 +234,7 @@ exportMethods("%in%", "over", "percent_rank", "pmod", + "posexplode", "quarter", "rand", "randn", http://git-wip-us.apache.org/repos/asf/spark/blob/46395db8/R/pkg/R/functions.R -- diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R index 09e5afa..52d46f9 100644 --- a/R/pkg/R/functions.R +++ b/R/pkg/R/functions.R @@ -2934,3 +2934,20 @@ setMethod("sort_array", jc <- callJStatic("org.apache.spark.sql.functions", "sort_array", x@jc, asc) column(jc) }) + +#' posexplode +#' +#' Creates a new row for each element with position in the given array or map column. +#' +#' @rdname posexplode +#' @name posexplode +#' @family collection_funcs +#' @export +#' @examples \dontrun{posexplode(df$c)} +#' @note posexplode since 2.1.0 +setMethod("posexplode", + signature(x = "Column"), + function(x) { +jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) +column(jc) + }) http://git-wip-us.apache.org/repos/asf/spark/blob/46395db8/R/pkg/R/generics.R -- diff --git a/R/pkg/R/generics.R b/R/pkg/R/generics.R index 0e4350f..d9080b6 100644 --- a/R/pkg/R/generics.R +++ b/R/pkg/R/generics.R @@ -1050,6 +1050,10 @@ setGeneric("percent_rank", function(x) { standardGeneric("percent_rank") }) #' @export setGeneric("pmod", function(y, x) { standardGeneric("pmod") }) +#' @rdname posexplode +#' @export +setGeneric("posexplode", function(x) { standardGeneric("posexplode") }) + #' @rdname quarter #' @export setGeneric("quarter", function(x) { standardGeneric("quarter") }) http://git-wip-us.apache.org/repos/asf/spark/blob/46395db8/R/pkg/inst/tests/testthat/test_sparkSQL.R -- diff --git