[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12765


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-29 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215796793
  
@davies I'm going to merge this first and address your comments in a future 
patch. Thanks for the review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12765#discussion_r61605515
  
--- Diff: python/pyspark/sql/conf.py ---
@@ -0,0 +1,114 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import since
+from pyspark.rdd import ignore_unicode_prefix
+
+
+class RuntimeConfig(object):
+"""User-facing configuration API, accessible through 
`SparkSession.conf`.
+
+Options set here are automatically propagated to the Hadoop 
configuration during I/O.
+This a thin wrapper around its Scala implementation 
org.apache.spark.sql.RuntimeConfig.
+"""
+
+def __init__(self, jconf):
+"""Create a new RuntimeConfig that wraps the underlying JVM 
object."""
+self._jconf = jconf
+
+@ignore_unicode_prefix
+@since(2.0)
+def set(self, key, value):
+"""Sets the given Spark runtime configuration property.
+
+>>> spark.conf.set("garble", "marble")
+>>> spark.getConf("garble")
+u'marble'
+"""
+self._jconf.set(key, value)
+
+@ignore_unicode_prefix
+@since(2.0)
+def get(self, key):
+"""Returns the value of Spark runtime configuration property for 
the given key,
+assuming it is set.
+
+>>> spark.setConf("bogo", "sipeo")
+>>> spark.conf.get("bogo")
+u'sipeo'
+>>> spark.conf.get("definitely.not.set") # doctest: 
+IGNORE_EXCEPTION_DETAIL
+Traceback (most recent call last):
+...
+Py4JJavaError: ...
+"""
+return self._jconf.get(key)
+
+@ignore_unicode_prefix
+@since(2.0)
+def getOption(self, key):
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12765#discussion_r61605369
  
--- Diff: python/pyspark/sql/session.py ---
@@ -121,6 +121,19 @@ def newSession(self):
 """
 return self.__class__(self._sc, self._jsparkSession.newSession())
 
+@property
+@since(2.0)
+def conf(self):
+"""Runtime configuration interface for Spark.
+
+This is the interface through which the user can get and set all 
Spark and Hadoop
+configurations that are relevant to Spark SQL. When getting the 
value of a config,
+this defaults to the value set in the underlying 
:class:`SparkContext`, if any.
+"""
+if not hasattr(self, "_conf"):
+self._conf = RuntimeConfig(self._jsparkSession.conf())
+return self._conf
+
 @since(2.0)
 def setConf(self, key, value):
--- End diff --

ok, let's decide later


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-29 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12765#discussion_r61545881
  
--- Diff: python/pyspark/sql/session.py ---
@@ -121,6 +121,19 @@ def newSession(self):
 """
 return self.__class__(self._sc, self._jsparkSession.newSession())
 
+@property
+@since(2.0)
+def conf(self):
+"""Runtime configuration interface for Spark.
+
+This is the interface through which the user can get and set all 
Spark and Hadoop
+configurations that are relevant to Spark SQL. When getting the 
value of a config,
+this defaults to the value set in the underlying 
:class:`SparkContext`, if any.
+"""
+if not hasattr(self, "_conf"):
+self._conf = RuntimeConfig(self._jsparkSession.conf())
+return self._conf
+
 @since(2.0)
 def setConf(self, key, value):
--- End diff --

actually i was looking at this. initially runtimeconf has a lot of 
functions, but now it is pretty simple and no longer have a lot. I'm wondering 
if we should just remove that ... and have only SparkSession.setConf/getConf


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-29 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12765#discussion_r61544135
  
--- Diff: python/pyspark/sql/session.py ---
@@ -121,6 +121,19 @@ def newSession(self):
 """
 return self.__class__(self._sc, self._jsparkSession.newSession())
 
+@property
+@since(2.0)
+def conf(self):
+"""Runtime configuration interface for Spark.
+
+This is the interface through which the user can get and set all 
Spark and Hadoop
+configurations that are relevant to Spark SQL. When getting the 
value of a config,
+this defaults to the value set in the underlying 
:class:`SparkContext`, if any.
+"""
+if not hasattr(self, "_conf"):
+self._conf = RuntimeConfig(self._jsparkSession.conf())
+return self._conf
+
 @since(2.0)
 def setConf(self, key, value):
--- End diff --

we should probably remove it in Scala actually.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215632276
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57306/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215632274
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215632186
  
**[Test build #57306 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57306/consoleFull)**
 for PR 12765 at commit 
[`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215620953
  
**[Test build #57306 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57306/consoleFull)**
 for PR 12765 at commit 
[`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215620780
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12765#discussion_r61529274
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -0,0 +1,426 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from collections import namedtuple
+
+from pyspark import since
+from pyspark.rdd import ignore_unicode_prefix
+from pyspark.sql.dataframe import DataFrame
+from pyspark.sql.functions import UserDefinedFunction
+from pyspark.sql.types import IntegerType, StringType, StructType
+
+
+Database = namedtuple("Database", "name description locationUri")
+Table = namedtuple("Table", "name database description tableType 
isTemporary")
+Column = namedtuple("Column", "name description dataType nullable 
isPartition isBucket")
+Function = namedtuple("Function", "name description className isTemporary")
+
+
+class Catalog(object):
+"""User-facing catalog API, accessible through `SparkSession.catalog`.
+
+This is a thin wrapper around its Scala implementation 
org.apache.spark.sql.catalog.Catalog.
+"""
+
+def __init__(self, sparkSession):
+"""Create a new Catalog that wraps the underlying JVM object."""
+self._sparkSession = sparkSession
+self._jsparkSession = sparkSession._jsparkSession
+self._jcatalog = sparkSession._jsparkSession.catalog()
+
+@ignore_unicode_prefix
+@since(2.0)
+def currentDatabase(self):
+"""Returns the current default database in this session.
+
+>>> spark.catalog._reset()
+>>> spark.catalog.currentDatabase()
+u'default'
+"""
+return self._jcatalog.currentDatabase()
+
+@ignore_unicode_prefix
+@since(2.0)
+def setCurrentDatabase(self, dbName):
+"""Sets the current default database in this session.
+
+>>> spark.catalog._reset()
+>>> spark.sql("CREATE DATABASE some_db")
+DataFrame[]
+>>> spark.catalog.setCurrentDatabase("some_db")
+>>> spark.catalog.currentDatabase()
+u'some_db'
+>>> spark.catalog.setCurrentDatabase("does_not_exist") # doctest: 
+IGNORE_EXCEPTION_DETAIL
+Traceback (most recent call last):
+...
+AnalysisException: ...
+"""
+return self._jcatalog.setCurrentDatabase(dbName)
+
+@ignore_unicode_prefix
+@since(2.0)
+def listDatabases(self):
+"""Returns a list of databases available across all sessions.
+
+>>> spark.catalog._reset()
+>>> [db.name for db in spark.catalog.listDatabases()]
+[u'default']
+>>> spark.sql("CREATE DATABASE some_db")
+DataFrame[]
+>>> [db.name for db in spark.catalog.listDatabases()]
+[u'default', u'some_db']
+"""
+iter = self._jcatalog.listDatabases().toLocalIterator()
+databases = []
+while iter.hasNext():
+jdb = iter.next()
+databases.append(Database(
+name=jdb.name(),
+description=jdb.description(),
+locationUri=jdb.locationUri()))
+return databases
+
+@ignore_unicode_prefix
+@since(2.0)
+def listTables(self, dbName=None):
+"""Returns a list of tables in the specified database.
+
+If no database is specified, the current database is used.
+This includes all temporary tables.
+
+>>> spark.catalog._reset()
+>>> spark.sql("CREATE DATABASE some_db")
+DataFrame[]
+>>> spark.catalog.listTables()
+[]
+>>> spark.catalog.listTables("some_db")
+[]
+>>> spark.createDataFrame([(1, 
1)]).registerTempTable("my_temp_tab")
+>>> spark.sql("CREATE TABLE my_tab1 (name STRING, age INT)")
+DataFrame[]
+>>> spark.s

[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12765#discussion_r61529260
  
--- Diff: python/pyspark/sql/session.py ---
@@ -121,6 +121,19 @@ def newSession(self):
 """
 return self.__class__(self._sc, self._jsparkSession.newSession())
 
+@property
+@since(2.0)
+def conf(self):
+"""Runtime configuration interface for Spark.
+
+This is the interface through which the user can get and set all 
Spark and Hadoop
+configurations that are relevant to Spark SQL. When getting the 
value of a config,
+this defaults to the value set in the underlying 
:class:`SparkContext`, if any.
+"""
+if not hasattr(self, "_conf"):
+self._conf = RuntimeConfig(self._jsparkSession.conf())
+return self._conf
+
 @since(2.0)
 def setConf(self, key, value):
--- End diff --

this one is also in scala, so I kept it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215609430
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57289/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215609428
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215609352
  
**[Test build #57289 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57289/consoleFull)**
 for PR 12765 at commit 
[`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215609337
  
**[Test build #2916 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2916/consoleFull)**
 for PR 12765 at commit 
[`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215608783
  
**[Test build #2918 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2918/consoleFull)**
 for PR 12765 at commit 
[`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12765#discussion_r61523711
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -0,0 +1,426 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from collections import namedtuple
+
+from pyspark import since
+from pyspark.rdd import ignore_unicode_prefix
+from pyspark.sql.dataframe import DataFrame
+from pyspark.sql.functions import UserDefinedFunction
+from pyspark.sql.types import IntegerType, StringType, StructType
+
+
+Database = namedtuple("Database", "name description locationUri")
+Table = namedtuple("Table", "name database description tableType 
isTemporary")
+Column = namedtuple("Column", "name description dataType nullable 
isPartition isBucket")
+Function = namedtuple("Function", "name description className isTemporary")
+
+
+class Catalog(object):
+"""User-facing catalog API, accessible through `SparkSession.catalog`.
+
+This is a thin wrapper around its Scala implementation 
org.apache.spark.sql.catalog.Catalog.
+"""
+
+def __init__(self, sparkSession):
+"""Create a new Catalog that wraps the underlying JVM object."""
+self._sparkSession = sparkSession
+self._jsparkSession = sparkSession._jsparkSession
+self._jcatalog = sparkSession._jsparkSession.catalog()
+
+@ignore_unicode_prefix
+@since(2.0)
+def currentDatabase(self):
+"""Returns the current default database in this session.
+
+>>> spark.catalog._reset()
+>>> spark.catalog.currentDatabase()
+u'default'
+"""
+return self._jcatalog.currentDatabase()
+
+@ignore_unicode_prefix
+@since(2.0)
+def setCurrentDatabase(self, dbName):
+"""Sets the current default database in this session.
+
+>>> spark.catalog._reset()
+>>> spark.sql("CREATE DATABASE some_db")
+DataFrame[]
+>>> spark.catalog.setCurrentDatabase("some_db")
+>>> spark.catalog.currentDatabase()
+u'some_db'
+>>> spark.catalog.setCurrentDatabase("does_not_exist") # doctest: 
+IGNORE_EXCEPTION_DETAIL
+Traceback (most recent call last):
+...
+AnalysisException: ...
+"""
+return self._jcatalog.setCurrentDatabase(dbName)
+
+@ignore_unicode_prefix
+@since(2.0)
+def listDatabases(self):
+"""Returns a list of databases available across all sessions.
+
+>>> spark.catalog._reset()
+>>> [db.name for db in spark.catalog.listDatabases()]
+[u'default']
+>>> spark.sql("CREATE DATABASE some_db")
+DataFrame[]
+>>> [db.name for db in spark.catalog.listDatabases()]
+[u'default', u'some_db']
+"""
+iter = self._jcatalog.listDatabases().toLocalIterator()
+databases = []
+while iter.hasNext():
+jdb = iter.next()
+databases.append(Database(
+name=jdb.name(),
+description=jdb.description(),
+locationUri=jdb.locationUri()))
+return databases
+
+@ignore_unicode_prefix
+@since(2.0)
+def listTables(self, dbName=None):
+"""Returns a list of tables in the specified database.
+
+If no database is specified, the current database is used.
+This includes all temporary tables.
+
+>>> spark.catalog._reset()
+>>> spark.sql("CREATE DATABASE some_db")
+DataFrame[]
+>>> spark.catalog.listTables()
+[]
+>>> spark.catalog.listTables("some_db")
+[]
+>>> spark.createDataFrame([(1, 
1)]).registerTempTable("my_temp_tab")
+>>> spark.sql("CREATE TABLE my_tab1 (name STRING, age INT)")
+DataFrame[]
+>>> spark.sql("

[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12765#discussion_r61523686
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -0,0 +1,426 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from collections import namedtuple
+
+from pyspark import since
+from pyspark.rdd import ignore_unicode_prefix
+from pyspark.sql.dataframe import DataFrame
+from pyspark.sql.functions import UserDefinedFunction
+from pyspark.sql.types import IntegerType, StringType, StructType
+
+
+Database = namedtuple("Database", "name description locationUri")
+Table = namedtuple("Table", "name database description tableType 
isTemporary")
+Column = namedtuple("Column", "name description dataType nullable 
isPartition isBucket")
+Function = namedtuple("Function", "name description className isTemporary")
+
+
+class Catalog(object):
+"""User-facing catalog API, accessible through `SparkSession.catalog`.
+
+This is a thin wrapper around its Scala implementation 
org.apache.spark.sql.catalog.Catalog.
+"""
+
+def __init__(self, sparkSession):
+"""Create a new Catalog that wraps the underlying JVM object."""
+self._sparkSession = sparkSession
+self._jsparkSession = sparkSession._jsparkSession
+self._jcatalog = sparkSession._jsparkSession.catalog()
+
+@ignore_unicode_prefix
+@since(2.0)
+def currentDatabase(self):
+"""Returns the current default database in this session.
+
+>>> spark.catalog._reset()
+>>> spark.catalog.currentDatabase()
+u'default'
+"""
+return self._jcatalog.currentDatabase()
+
+@ignore_unicode_prefix
+@since(2.0)
+def setCurrentDatabase(self, dbName):
+"""Sets the current default database in this session.
+
+>>> spark.catalog._reset()
+>>> spark.sql("CREATE DATABASE some_db")
+DataFrame[]
+>>> spark.catalog.setCurrentDatabase("some_db")
+>>> spark.catalog.currentDatabase()
+u'some_db'
+>>> spark.catalog.setCurrentDatabase("does_not_exist") # doctest: 
+IGNORE_EXCEPTION_DETAIL
+Traceback (most recent call last):
+...
+AnalysisException: ...
+"""
+return self._jcatalog.setCurrentDatabase(dbName)
+
+@ignore_unicode_prefix
+@since(2.0)
+def listDatabases(self):
+"""Returns a list of databases available across all sessions.
+
+>>> spark.catalog._reset()
+>>> [db.name for db in spark.catalog.listDatabases()]
+[u'default']
+>>> spark.sql("CREATE DATABASE some_db")
+DataFrame[]
+>>> [db.name for db in spark.catalog.listDatabases()]
+[u'default', u'some_db']
+"""
+iter = self._jcatalog.listDatabases().toLocalIterator()
+databases = []
+while iter.hasNext():
+jdb = iter.next()
+databases.append(Database(
+name=jdb.name(),
+description=jdb.description(),
+locationUri=jdb.locationUri()))
+return databases
+
+@ignore_unicode_prefix
+@since(2.0)
+def listTables(self, dbName=None):
+"""Returns a list of tables in the specified database.
+
+If no database is specified, the current database is used.
+This includes all temporary tables.
+
+>>> spark.catalog._reset()
+>>> spark.sql("CREATE DATABASE some_db")
+DataFrame[]
+>>> spark.catalog.listTables()
+[]
+>>> spark.catalog.listTables("some_db")
+[]
+>>> spark.createDataFrame([(1, 
1)]).registerTempTable("my_temp_tab")
+>>> spark.sql("CREATE TABLE my_tab1 (name STRING, age INT)")
+DataFrame[]
+>>> spark.sql("

[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12765#discussion_r61523630
  
--- Diff: python/pyspark/sql/session.py ---
@@ -121,6 +121,19 @@ def newSession(self):
 """
 return self.__class__(self._sc, self._jsparkSession.newSession())
 
+@property
+@since(2.0)
+def conf(self):
+"""Runtime configuration interface for Spark.
+
+This is the interface through which the user can get and set all 
Spark and Hadoop
+configurations that are relevant to Spark SQL. When getting the 
value of a config,
+this defaults to the value set in the underlying 
:class:`SparkContext`, if any.
+"""
+if not hasattr(self, "_conf"):
+self._conf = RuntimeConfig(self._jsparkSession.conf())
+return self._conf
+
 @since(2.0)
 def setConf(self, key, value):
--- End diff --

Since we removed the registerFunction, should we also remove this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12765#discussion_r61523579
  
--- Diff: python/pyspark/sql/conf.py ---
@@ -0,0 +1,114 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import since
+from pyspark.rdd import ignore_unicode_prefix
+
+
+class RuntimeConfig(object):
+"""User-facing configuration API, accessible through 
`SparkSession.conf`.
+
+Options set here are automatically propagated to the Hadoop 
configuration during I/O.
+This a thin wrapper around its Scala implementation 
org.apache.spark.sql.RuntimeConfig.
+"""
+
+def __init__(self, jconf):
+"""Create a new RuntimeConfig that wraps the underlying JVM 
object."""
+self._jconf = jconf
+
+@ignore_unicode_prefix
+@since(2.0)
+def set(self, key, value):
+"""Sets the given Spark runtime configuration property.
+
+>>> spark.conf.set("garble", "marble")
+>>> spark.getConf("garble")
+u'marble'
+"""
+self._jconf.set(key, value)
+
+@ignore_unicode_prefix
+@since(2.0)
+def get(self, key):
+"""Returns the value of Spark runtime configuration property for 
the given key,
+assuming it is set.
+
+>>> spark.setConf("bogo", "sipeo")
+>>> spark.conf.get("bogo")
+u'sipeo'
+>>> spark.conf.get("definitely.not.set") # doctest: 
+IGNORE_EXCEPTION_DETAIL
+Traceback (most recent call last):
+...
+Py4JJavaError: ...
+"""
+return self._jconf.get(key)
+
+@ignore_unicode_prefix
+@since(2.0)
+def getOption(self, key):
--- End diff --

`get(self, key, default=None)` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215600883
  
cc @davies


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215599870
  
**[Test build #2917 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2917/consoleFull)**
 for PR 12765 at commit 
[`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215598507
  
**[Test build #2918 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2918/consoleFull)**
 for PR 12765 at commit 
[`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215598468
  
**[Test build #2917 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2917/consoleFull)**
 for PR 12765 at commit 
[`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215598432
  
**[Test build #2916 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2916/consoleFull)**
 for PR 12765 at commit 
[`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215598307
  
**[Test build #57289 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57289/consoleFull)**
 for PR 12765 at commit 
[`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215584504
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57280/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215584501
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215584321
  
**[Test build #57280 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57280/consoleFull)**
 for PR 12765 at commit 
[`d32ee8c`](https://github.com/apache/spark/commit/d32ee8cdfab57ec61eab8a84bcac13462f40ffd4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215583865
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215583867
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57278/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215583680
  
**[Test build #57278 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57278/consoleFull)**
 for PR 12765 at commit 
[`a263f74`](https://github.com/apache/spark/commit/a263f7403fde1ff79655efbafee11daed5e81bac).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215565286
  
**[Test build #57280 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57280/consoleFull)**
 for PR 12765 at commit 
[`d32ee8c`](https://github.com/apache/spark/commit/d32ee8cdfab57ec61eab8a84bcac13462f40ffd4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12765#issuecomment-215564024
  
**[Test build #57278 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57278/consoleFull)**
 for PR 12765 at commit 
[`a263f74`](https://github.com/apache/spark/commit/a263f7403fde1ff79655efbafee11daed5e81bac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org