[2/2] git commit: [SPARK-2097][SQL] UDF Support

2014-08-02 Thread marmbrus
[SPARK-2097][SQL] UDF Support

This patch adds the ability to register lambda functions written in Python, 
Java or Scala as UDFs for use in SQL or HiveQL.

Scala:
```scala
registerFunction(strLenScala, (_: String).length)
sql(SELECT strLenScala('test'))
```
Python:
```python
sqlCtx.registerFunction(strLenPython, lambda x: len(x), IntegerType())
sqlCtx.sql(SELECT strLenPython('test'))
```
Java:
```java
sqlContext.registerFunction(stringLengthJava, new UDF1String, Integer() {
  Override
  public Integer call(String str) throws Exception {
return str.length();
  }
}, DataType.IntegerType);

sqlContext.sql(SELECT stringLengthJava('test'));
```

Author: Michael Armbrust mich...@databricks.com

Closes #1063 from marmbrus/udfs and squashes the following commits:

9eda0fe [Michael Armbrust] newline
747c05e [Michael Armbrust] Add some scala UDF tests.
d92727d [Michael Armbrust] Merge remote-tracking branch 'apache/master' into 
udfs
005d684 [Michael Armbrust] Fix naming and formatting.
d14dac8 [Michael Armbrust] Fix last line of autogened java files.
8135c48 [Michael Armbrust] Move UDF unit tests to pyspark.
40b0ffd [Michael Armbrust] Merge remote-tracking branch 'apache/master' into 
udfs
6a36890 [Michael Armbrust] Switch logging so that SQLContext can be 
serializable.
7a83101 [Michael Armbrust] Drop toString
795fd15 [Michael Armbrust] Try to avoid capturing SQLContext.
e54fb45 [Michael Armbrust] Docs and tests.
437cbe3 [Michael Armbrust] Update use of dataTypes, fix some python tests, 
address review comments.
01517d6 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into 
udfs
8e6c932 [Michael Armbrust] WIP
3f96a52 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into 
udfs
6237c8d [Michael Armbrust] WIP
2766f0b [Michael Armbrust] Move udfs support to SQL from hive. Add support for 
Java UDFs.
0f7d50c [Michael Armbrust] Draft of native Spark SQL UDFs for Scala and Python.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/158ad0bb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/158ad0bb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/158ad0bb

Branch: refs/heads/master
Commit: 158ad0bba9382fd494b4789b5628a9cec00cfa19
Parents: 4c47711
Author: Michael Armbrust mich...@databricks.com
Authored: Sat Aug 2 16:33:48 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sat Aug 2 16:33:48 2014 -0700

--
 python/pyspark/sql.py   |  39 ++-
 .../catalyst/analysis/FunctionRegistry.scala|  32 ++
 .../sql/catalyst/expressions/ScalaUdf.scala | 307 +++
 .../org/apache/spark/sql/api/java/UDF1.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF10.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF11.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF12.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF13.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF14.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF15.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF16.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF17.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF18.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF19.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF2.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF20.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF21.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF22.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF3.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF4.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF5.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF6.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF7.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF8.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF9.java |  32 ++
 .../scala/org/apache/spark/sql/SQLContext.scala |  11 +-
 .../org/apache/spark/sql/UdfRegistration.scala  | 196 
 .../spark/sql/api/java/JavaSQLContext.scala |   5 +-
 .../spark/sql/api/java/UDFRegistration.scala| 252 +++
 .../spark/sql/execution/SparkStrategies.scala   |   2 +
 .../apache/spark/sql/execution/pythonUdfs.scala | 177 +++
 .../apache/spark/sql/api/java/JavaAPISuite.java |  90 ++
 .../org/apache/spark/sql/InsertIntoSuite.scala  |   2 +-
 .../scala/org/apache/spark/sql/UDFSuite.scala   |  36 +++
 .../org/apache/spark/sql/hive/HiveContext.scala |  13 +-
 .../org/apache/spark/sql/hive/TestHive.scala|   4 +-
 .../org/apache/spark/sql/hive/hiveUdfs.scala|   6 +-
 .../scala/org/apache/spark/sql/QueryTest.scala  |   4 +-
 38 files changed, 1861 insertions(+), 19 deletions(-)
--



[2/2] git commit: [SPARK-2097][SQL] UDF Support

2014-08-02 Thread marmbrus
[SPARK-2097][SQL] UDF Support

This patch adds the ability to register lambda functions written in Python, 
Java or Scala as UDFs for use in SQL or HiveQL.

Scala:
```scala
registerFunction(strLenScala, (_: String).length)
sql(SELECT strLenScala('test'))
```
Python:
```python
sqlCtx.registerFunction(strLenPython, lambda x: len(x), IntegerType())
sqlCtx.sql(SELECT strLenPython('test'))
```
Java:
```java
sqlContext.registerFunction(stringLengthJava, new UDF1String, Integer() {
  Override
  public Integer call(String str) throws Exception {
return str.length();
  }
}, DataType.IntegerType);

sqlContext.sql(SELECT stringLengthJava('test'));
```

Author: Michael Armbrust mich...@databricks.com

Closes #1063 from marmbrus/udfs and squashes the following commits:

9eda0fe [Michael Armbrust] newline
747c05e [Michael Armbrust] Add some scala UDF tests.
d92727d [Michael Armbrust] Merge remote-tracking branch 'apache/master' into 
udfs
005d684 [Michael Armbrust] Fix naming and formatting.
d14dac8 [Michael Armbrust] Fix last line of autogened java files.
8135c48 [Michael Armbrust] Move UDF unit tests to pyspark.
40b0ffd [Michael Armbrust] Merge remote-tracking branch 'apache/master' into 
udfs
6a36890 [Michael Armbrust] Switch logging so that SQLContext can be 
serializable.
7a83101 [Michael Armbrust] Drop toString
795fd15 [Michael Armbrust] Try to avoid capturing SQLContext.
e54fb45 [Michael Armbrust] Docs and tests.
437cbe3 [Michael Armbrust] Update use of dataTypes, fix some python tests, 
address review comments.
01517d6 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into 
udfs
8e6c932 [Michael Armbrust] WIP
3f96a52 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into 
udfs
6237c8d [Michael Armbrust] WIP
2766f0b [Michael Armbrust] Move udfs support to SQL from hive. Add support for 
Java UDFs.
0f7d50c [Michael Armbrust] Draft of native Spark SQL UDFs for Scala and Python.

(cherry picked from commit 158ad0bba9382fd494b4789b5628a9cec00cfa19)
Signed-off-by: Michael Armbrust mich...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3b9f25f4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3b9f25f4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3b9f25f4

Branch: refs/heads/branch-1.1
Commit: 3b9f25f4259b254f3faa2a7d61e547089a69c259
Parents: 7924d72
Author: Michael Armbrust mich...@databricks.com
Authored: Sat Aug 2 16:33:48 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sat Aug 2 16:34:00 2014 -0700

--
 python/pyspark/sql.py   |  39 ++-
 .../catalyst/analysis/FunctionRegistry.scala|  32 ++
 .../sql/catalyst/expressions/ScalaUdf.scala | 307 +++
 .../org/apache/spark/sql/api/java/UDF1.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF10.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF11.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF12.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF13.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF14.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF15.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF16.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF17.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF18.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF19.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF2.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF20.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF21.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF22.java|  32 ++
 .../org/apache/spark/sql/api/java/UDF3.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF4.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF5.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF6.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF7.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF8.java |  32 ++
 .../org/apache/spark/sql/api/java/UDF9.java |  32 ++
 .../scala/org/apache/spark/sql/SQLContext.scala |  11 +-
 .../org/apache/spark/sql/UdfRegistration.scala  | 196 
 .../spark/sql/api/java/JavaSQLContext.scala |   5 +-
 .../spark/sql/api/java/UDFRegistration.scala| 252 +++
 .../spark/sql/execution/SparkStrategies.scala   |   2 +
 .../apache/spark/sql/execution/pythonUdfs.scala | 177 +++
 .../apache/spark/sql/api/java/JavaAPISuite.java |  90 ++
 .../org/apache/spark/sql/InsertIntoSuite.scala  |   2 +-
 .../scala/org/apache/spark/sql/UDFSuite.scala   |  36 +++
 .../org/apache/spark/sql/hive/HiveContext.scala |  13 +-
 .../org/apache/spark/sql/hive/TestHive.scala|   4 +-
 .../org/apache/spark/sql/hive/hiveUdfs.scala|   6 +-
 .../scala/org/apache/spark/sql/QueryTest.scala  |   4 +-
 38 files changed, 1861 insertions(+), 19 deletions(-)