spark git commit: [SPARK-16289][SQL] Implement posexplode table generating function

2016-07-07 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 7ef1d1c61 -> a04975457


[SPARK-16289][SQL] Implement posexplode table generating function

This PR implements `posexplode` table generating function. Currently, master 
branch raises the following exception for `map` argument. It's different from 
Hive.

**Before**
```scala
scala> sql("select posexplode(map('a', 1, 'b', 2))").show
org.apache.spark.sql.AnalysisException: No handler for Hive UDF ... 
posexplode() takes an array as a parameter; line 1 pos 7
```

**After**
```scala
scala> sql("select posexplode(map('a', 1, 'b', 2))").show
+---+---+-+
|pos|key|value|
+---+---+-+
|  0|  a|1|
|  1|  b|2|
+---+---+-+
```

For `array` argument, `after` is the same with `before`.
```
scala> sql("select posexplode(array(1, 2, 3))").show
+---+---+
|pos|col|
+---+---+
|  0|  1|
|  1|  2|
|  2|  3|
+---+---+
```

Pass the Jenkins tests with newly added testcases.

Author: Dongjoon Hyun 

Closes #13971 from dongjoon-hyun/SPARK-16289.

(cherry picked from commit 46395db80e3304e3f3a1ebdc8aadb8f2819b48b4)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a0497545
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a0497545
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a0497545

Branch: refs/heads/branch-2.0
Commit: a049754577aa78a5a26b38821233861a4dfd8e8a
Parents: 7ef1d1c
Author: Dongjoon Hyun 
Authored: Thu Jun 30 12:03:54 2016 -0700
Committer: Reynold Xin 
Committed: Thu Jul 7 21:05:31 2016 -0700

--
 R/pkg/NAMESPACE |  1 +
 R/pkg/R/functions.R | 17 
 R/pkg/R/generics.R  |  4 +
 R/pkg/inst/tests/testthat/test_sparkSQL.R   |  2 +-
 python/pyspark/sql/functions.py | 21 +
 .../catalyst/analysis/FunctionRegistry.scala|  1 +
 .../sql/catalyst/expressions/generators.scala   | 66 +++---
 .../analysis/ExpressionTypeCheckingSuite.scala  |  2 +
 .../expressions/GeneratorExpressionSuite.scala  | 71 +++
 .../scala/org/apache/spark/sql/Column.scala |  1 +
 .../scala/org/apache/spark/sql/functions.scala  |  8 ++
 .../spark/sql/ColumnExpressionSuite.scala   | 60 -
 .../spark/sql/GeneratorFunctionSuite.scala  | 92 
 .../spark/sql/hive/HiveSessionCatalog.scala |  2 +-
 14 files changed, 276 insertions(+), 72 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a0497545/R/pkg/NAMESPACE
--
diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index 9fd2568..bc3aceb 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -235,6 +235,7 @@ exportMethods("%in%",
   "over",
   "percent_rank",
   "pmod",
+  "posexplode",
   "quarter",
   "rand",
   "randn",

http://git-wip-us.apache.org/repos/asf/spark/blob/a0497545/R/pkg/R/functions.R
--
diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R
index 09e5afa..52d46f9 100644
--- a/R/pkg/R/functions.R
+++ b/R/pkg/R/functions.R
@@ -2934,3 +2934,20 @@ setMethod("sort_array",
 jc <- callJStatic("org.apache.spark.sql.functions", "sort_array", 
x@jc, asc)
 column(jc)
   })
+
+#' posexplode
+#'
+#' Creates a new row for each element with position in the given array or map 
column.
+#'
+#' @rdname posexplode
+#' @name posexplode
+#' @family collection_funcs
+#' @export
+#' @examples \dontrun{posexplode(df$c)}
+#' @note posexplode since 2.1.0
+setMethod("posexplode",
+  signature(x = "Column"),
+  function(x) {
+jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", 
x@jc)
+column(jc)
+  })

http://git-wip-us.apache.org/repos/asf/spark/blob/a0497545/R/pkg/R/generics.R
--
diff --git a/R/pkg/R/generics.R b/R/pkg/R/generics.R
index b0f25de..e4ec508 100644
--- a/R/pkg/R/generics.R
+++ b/R/pkg/R/generics.R
@@ -1054,6 +1054,10 @@ setGeneric("percent_rank", function(x) { 
standardGeneric("percent_rank") })
 #' @export
 setGeneric("pmod", function(y, x) { standardGeneric("pmod") })
 
+#' @rdname posexplode
+#' @export
+setGeneric("posexplode", function(x) { standardGeneric("posexplode") })
+
 #' @rdname quarter
 #' @export
 setGeneric("quarter", function(x) { standardGeneric("quarter") })

http://git-wip-us.apache.org/repos/asf/spark/blob/a0497545/R/pkg/inst/tests/testthat/test_sparkSQL.R

spark git commit: [SPARK-16289][SQL] Implement posexplode table generating function

2016-06-30 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master fdf9f94f8 -> 46395db80


[SPARK-16289][SQL] Implement posexplode table generating function

## What changes were proposed in this pull request?

This PR implements `posexplode` table generating function. Currently, master 
branch raises the following exception for `map` argument. It's different from 
Hive.

**Before**
```scala
scala> sql("select posexplode(map('a', 1, 'b', 2))").show
org.apache.spark.sql.AnalysisException: No handler for Hive UDF ... 
posexplode() takes an array as a parameter; line 1 pos 7
```

**After**
```scala
scala> sql("select posexplode(map('a', 1, 'b', 2))").show
+---+---+-+
|pos|key|value|
+---+---+-+
|  0|  a|1|
|  1|  b|2|
+---+---+-+
```

For `array` argument, `after` is the same with `before`.
```
scala> sql("select posexplode(array(1, 2, 3))").show
+---+---+
|pos|col|
+---+---+
|  0|  1|
|  1|  2|
|  2|  3|
+---+---+
```

## How was this patch tested?

Pass the Jenkins tests with newly added testcases.

Author: Dongjoon Hyun 

Closes #13971 from dongjoon-hyun/SPARK-16289.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/46395db8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/46395db8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/46395db8

Branch: refs/heads/master
Commit: 46395db80e3304e3f3a1ebdc8aadb8f2819b48b4
Parents: fdf9f94
Author: Dongjoon Hyun 
Authored: Thu Jun 30 12:03:54 2016 -0700
Committer: Reynold Xin 
Committed: Thu Jun 30 12:03:54 2016 -0700

--
 R/pkg/NAMESPACE |  1 +
 R/pkg/R/functions.R | 17 
 R/pkg/R/generics.R  |  4 +
 R/pkg/inst/tests/testthat/test_sparkSQL.R   |  2 +-
 python/pyspark/sql/functions.py | 21 +
 .../catalyst/analysis/FunctionRegistry.scala|  1 +
 .../sql/catalyst/expressions/generators.scala   | 66 +++---
 .../analysis/ExpressionTypeCheckingSuite.scala  |  2 +
 .../expressions/GeneratorExpressionSuite.scala  | 71 +++
 .../scala/org/apache/spark/sql/Column.scala |  1 +
 .../scala/org/apache/spark/sql/functions.scala  |  8 ++
 .../spark/sql/ColumnExpressionSuite.scala   | 60 -
 .../spark/sql/GeneratorFunctionSuite.scala  | 92 
 .../spark/sql/hive/HiveSessionCatalog.scala |  2 +-
 14 files changed, 276 insertions(+), 72 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/46395db8/R/pkg/NAMESPACE
--
diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index e0ffde9..abc6588 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -234,6 +234,7 @@ exportMethods("%in%",
   "over",
   "percent_rank",
   "pmod",
+  "posexplode",
   "quarter",
   "rand",
   "randn",

http://git-wip-us.apache.org/repos/asf/spark/blob/46395db8/R/pkg/R/functions.R
--
diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R
index 09e5afa..52d46f9 100644
--- a/R/pkg/R/functions.R
+++ b/R/pkg/R/functions.R
@@ -2934,3 +2934,20 @@ setMethod("sort_array",
 jc <- callJStatic("org.apache.spark.sql.functions", "sort_array", 
x@jc, asc)
 column(jc)
   })
+
+#' posexplode
+#'
+#' Creates a new row for each element with position in the given array or map 
column.
+#'
+#' @rdname posexplode
+#' @name posexplode
+#' @family collection_funcs
+#' @export
+#' @examples \dontrun{posexplode(df$c)}
+#' @note posexplode since 2.1.0
+setMethod("posexplode",
+  signature(x = "Column"),
+  function(x) {
+jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", 
x@jc)
+column(jc)
+  })

http://git-wip-us.apache.org/repos/asf/spark/blob/46395db8/R/pkg/R/generics.R
--
diff --git a/R/pkg/R/generics.R b/R/pkg/R/generics.R
index 0e4350f..d9080b6 100644
--- a/R/pkg/R/generics.R
+++ b/R/pkg/R/generics.R
@@ -1050,6 +1050,10 @@ setGeneric("percent_rank", function(x) { 
standardGeneric("percent_rank") })
 #' @export
 setGeneric("pmod", function(y, x) { standardGeneric("pmod") })
 
+#' @rdname posexplode
+#' @export
+setGeneric("posexplode", function(x) { standardGeneric("posexplode") })
+
 #' @rdname quarter
 #' @export
 setGeneric("quarter", function(x) { standardGeneric("quarter") })

http://git-wip-us.apache.org/repos/asf/spark/blob/46395db8/R/pkg/inst/tests/testthat/test_sparkSQL.R
--
diff --git