[2/2] git commit: [SPARK-2097][SQL] UDF Support
[SPARK-2097][SQL] UDF Support This patch adds the ability to register lambda functions written in Python, Java or Scala as UDFs for use in SQL or HiveQL. Scala: ```scala registerFunction(strLenScala, (_: String).length) sql(SELECT strLenScala('test')) ``` Python: ```python sqlCtx.registerFunction(strLenPython, lambda x: len(x), IntegerType()) sqlCtx.sql(SELECT strLenPython('test')) ``` Java: ```java sqlContext.registerFunction(stringLengthJava, new UDF1String, Integer() { Override public Integer call(String str) throws Exception { return str.length(); } }, DataType.IntegerType); sqlContext.sql(SELECT stringLengthJava('test')); ``` Author: Michael Armbrust mich...@databricks.com Closes #1063 from marmbrus/udfs and squashes the following commits: 9eda0fe [Michael Armbrust] newline 747c05e [Michael Armbrust] Add some scala UDF tests. d92727d [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs 005d684 [Michael Armbrust] Fix naming and formatting. d14dac8 [Michael Armbrust] Fix last line of autogened java files. 8135c48 [Michael Armbrust] Move UDF unit tests to pyspark. 40b0ffd [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs 6a36890 [Michael Armbrust] Switch logging so that SQLContext can be serializable. 7a83101 [Michael Armbrust] Drop toString 795fd15 [Michael Armbrust] Try to avoid capturing SQLContext. e54fb45 [Michael Armbrust] Docs and tests. 437cbe3 [Michael Armbrust] Update use of dataTypes, fix some python tests, address review comments. 01517d6 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs 8e6c932 [Michael Armbrust] WIP 3f96a52 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs 6237c8d [Michael Armbrust] WIP 2766f0b [Michael Armbrust] Move udfs support to SQL from hive. Add support for Java UDFs. 0f7d50c [Michael Armbrust] Draft of native Spark SQL UDFs for Scala and Python. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/158ad0bb Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/158ad0bb Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/158ad0bb Branch: refs/heads/master Commit: 158ad0bba9382fd494b4789b5628a9cec00cfa19 Parents: 4c47711 Author: Michael Armbrust mich...@databricks.com Authored: Sat Aug 2 16:33:48 2014 -0700 Committer: Michael Armbrust mich...@databricks.com Committed: Sat Aug 2 16:33:48 2014 -0700 -- python/pyspark/sql.py | 39 ++- .../catalyst/analysis/FunctionRegistry.scala| 32 ++ .../sql/catalyst/expressions/ScalaUdf.scala | 307 +++ .../org/apache/spark/sql/api/java/UDF1.java | 32 ++ .../org/apache/spark/sql/api/java/UDF10.java| 32 ++ .../org/apache/spark/sql/api/java/UDF11.java| 32 ++ .../org/apache/spark/sql/api/java/UDF12.java| 32 ++ .../org/apache/spark/sql/api/java/UDF13.java| 32 ++ .../org/apache/spark/sql/api/java/UDF14.java| 32 ++ .../org/apache/spark/sql/api/java/UDF15.java| 32 ++ .../org/apache/spark/sql/api/java/UDF16.java| 32 ++ .../org/apache/spark/sql/api/java/UDF17.java| 32 ++ .../org/apache/spark/sql/api/java/UDF18.java| 32 ++ .../org/apache/spark/sql/api/java/UDF19.java| 32 ++ .../org/apache/spark/sql/api/java/UDF2.java | 32 ++ .../org/apache/spark/sql/api/java/UDF20.java| 32 ++ .../org/apache/spark/sql/api/java/UDF21.java| 32 ++ .../org/apache/spark/sql/api/java/UDF22.java| 32 ++ .../org/apache/spark/sql/api/java/UDF3.java | 32 ++ .../org/apache/spark/sql/api/java/UDF4.java | 32 ++ .../org/apache/spark/sql/api/java/UDF5.java | 32 ++ .../org/apache/spark/sql/api/java/UDF6.java | 32 ++ .../org/apache/spark/sql/api/java/UDF7.java | 32 ++ .../org/apache/spark/sql/api/java/UDF8.java | 32 ++ .../org/apache/spark/sql/api/java/UDF9.java | 32 ++ .../scala/org/apache/spark/sql/SQLContext.scala | 11 +- .../org/apache/spark/sql/UdfRegistration.scala | 196 .../spark/sql/api/java/JavaSQLContext.scala | 5 +- .../spark/sql/api/java/UDFRegistration.scala| 252 +++ .../spark/sql/execution/SparkStrategies.scala | 2 + .../apache/spark/sql/execution/pythonUdfs.scala | 177 +++ .../apache/spark/sql/api/java/JavaAPISuite.java | 90 ++ .../org/apache/spark/sql/InsertIntoSuite.scala | 2 +- .../scala/org/apache/spark/sql/UDFSuite.scala | 36 +++ .../org/apache/spark/sql/hive/HiveContext.scala | 13 +- .../org/apache/spark/sql/hive/TestHive.scala| 4 +- .../org/apache/spark/sql/hive/hiveUdfs.scala| 6 +- .../scala/org/apache/spark/sql/QueryTest.scala | 4 +- 38 files changed, 1861 insertions(+), 19 deletions(-) --
[2/2] git commit: [SPARK-2097][SQL] UDF Support
[SPARK-2097][SQL] UDF Support This patch adds the ability to register lambda functions written in Python, Java or Scala as UDFs for use in SQL or HiveQL. Scala: ```scala registerFunction(strLenScala, (_: String).length) sql(SELECT strLenScala('test')) ``` Python: ```python sqlCtx.registerFunction(strLenPython, lambda x: len(x), IntegerType()) sqlCtx.sql(SELECT strLenPython('test')) ``` Java: ```java sqlContext.registerFunction(stringLengthJava, new UDF1String, Integer() { Override public Integer call(String str) throws Exception { return str.length(); } }, DataType.IntegerType); sqlContext.sql(SELECT stringLengthJava('test')); ``` Author: Michael Armbrust mich...@databricks.com Closes #1063 from marmbrus/udfs and squashes the following commits: 9eda0fe [Michael Armbrust] newline 747c05e [Michael Armbrust] Add some scala UDF tests. d92727d [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs 005d684 [Michael Armbrust] Fix naming and formatting. d14dac8 [Michael Armbrust] Fix last line of autogened java files. 8135c48 [Michael Armbrust] Move UDF unit tests to pyspark. 40b0ffd [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs 6a36890 [Michael Armbrust] Switch logging so that SQLContext can be serializable. 7a83101 [Michael Armbrust] Drop toString 795fd15 [Michael Armbrust] Try to avoid capturing SQLContext. e54fb45 [Michael Armbrust] Docs and tests. 437cbe3 [Michael Armbrust] Update use of dataTypes, fix some python tests, address review comments. 01517d6 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs 8e6c932 [Michael Armbrust] WIP 3f96a52 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs 6237c8d [Michael Armbrust] WIP 2766f0b [Michael Armbrust] Move udfs support to SQL from hive. Add support for Java UDFs. 0f7d50c [Michael Armbrust] Draft of native Spark SQL UDFs for Scala and Python. (cherry picked from commit 158ad0bba9382fd494b4789b5628a9cec00cfa19) Signed-off-by: Michael Armbrust mich...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3b9f25f4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3b9f25f4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3b9f25f4 Branch: refs/heads/branch-1.1 Commit: 3b9f25f4259b254f3faa2a7d61e547089a69c259 Parents: 7924d72 Author: Michael Armbrust mich...@databricks.com Authored: Sat Aug 2 16:33:48 2014 -0700 Committer: Michael Armbrust mich...@databricks.com Committed: Sat Aug 2 16:34:00 2014 -0700 -- python/pyspark/sql.py | 39 ++- .../catalyst/analysis/FunctionRegistry.scala| 32 ++ .../sql/catalyst/expressions/ScalaUdf.scala | 307 +++ .../org/apache/spark/sql/api/java/UDF1.java | 32 ++ .../org/apache/spark/sql/api/java/UDF10.java| 32 ++ .../org/apache/spark/sql/api/java/UDF11.java| 32 ++ .../org/apache/spark/sql/api/java/UDF12.java| 32 ++ .../org/apache/spark/sql/api/java/UDF13.java| 32 ++ .../org/apache/spark/sql/api/java/UDF14.java| 32 ++ .../org/apache/spark/sql/api/java/UDF15.java| 32 ++ .../org/apache/spark/sql/api/java/UDF16.java| 32 ++ .../org/apache/spark/sql/api/java/UDF17.java| 32 ++ .../org/apache/spark/sql/api/java/UDF18.java| 32 ++ .../org/apache/spark/sql/api/java/UDF19.java| 32 ++ .../org/apache/spark/sql/api/java/UDF2.java | 32 ++ .../org/apache/spark/sql/api/java/UDF20.java| 32 ++ .../org/apache/spark/sql/api/java/UDF21.java| 32 ++ .../org/apache/spark/sql/api/java/UDF22.java| 32 ++ .../org/apache/spark/sql/api/java/UDF3.java | 32 ++ .../org/apache/spark/sql/api/java/UDF4.java | 32 ++ .../org/apache/spark/sql/api/java/UDF5.java | 32 ++ .../org/apache/spark/sql/api/java/UDF6.java | 32 ++ .../org/apache/spark/sql/api/java/UDF7.java | 32 ++ .../org/apache/spark/sql/api/java/UDF8.java | 32 ++ .../org/apache/spark/sql/api/java/UDF9.java | 32 ++ .../scala/org/apache/spark/sql/SQLContext.scala | 11 +- .../org/apache/spark/sql/UdfRegistration.scala | 196 .../spark/sql/api/java/JavaSQLContext.scala | 5 +- .../spark/sql/api/java/UDFRegistration.scala| 252 +++ .../spark/sql/execution/SparkStrategies.scala | 2 + .../apache/spark/sql/execution/pythonUdfs.scala | 177 +++ .../apache/spark/sql/api/java/JavaAPISuite.java | 90 ++ .../org/apache/spark/sql/InsertIntoSuite.scala | 2 +- .../scala/org/apache/spark/sql/UDFSuite.scala | 36 +++ .../org/apache/spark/sql/hive/HiveContext.scala | 13 +- .../org/apache/spark/sql/hive/TestHive.scala| 4 +- .../org/apache/spark/sql/hive/hiveUdfs.scala| 6 +- .../scala/org/apache/spark/sql/QueryTest.scala | 4 +- 38 files changed, 1861 insertions(+), 19 deletions(-)