[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12765 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215796793 @davies I'm going to merge this first and address your comments in a future patch. Thanks for the review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/12765#discussion_r61605515 --- Diff: python/pyspark/sql/conf.py --- @@ -0,0 +1,114 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from pyspark import since +from pyspark.rdd import ignore_unicode_prefix + + +class RuntimeConfig(object): +"""User-facing configuration API, accessible through `SparkSession.conf`. + +Options set here are automatically propagated to the Hadoop configuration during I/O. +This a thin wrapper around its Scala implementation org.apache.spark.sql.RuntimeConfig. +""" + +def __init__(self, jconf): +"""Create a new RuntimeConfig that wraps the underlying JVM object.""" +self._jconf = jconf + +@ignore_unicode_prefix +@since(2.0) +def set(self, key, value): +"""Sets the given Spark runtime configuration property. + +>>> spark.conf.set("garble", "marble") +>>> spark.getConf("garble") +u'marble' +""" +self._jconf.set(key, value) + +@ignore_unicode_prefix +@since(2.0) +def get(self, key): +"""Returns the value of Spark runtime configuration property for the given key, +assuming it is set. + +>>> spark.setConf("bogo", "sipeo") +>>> spark.conf.get("bogo") +u'sipeo' +>>> spark.conf.get("definitely.not.set") # doctest: +IGNORE_EXCEPTION_DETAIL +Traceback (most recent call last): +... +Py4JJavaError: ... +""" +return self._jconf.get(key) + +@ignore_unicode_prefix +@since(2.0) +def getOption(self, key): --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/12765#discussion_r61605369 --- Diff: python/pyspark/sql/session.py --- @@ -121,6 +121,19 @@ def newSession(self): """ return self.__class__(self._sc, self._jsparkSession.newSession()) +@property +@since(2.0) +def conf(self): +"""Runtime configuration interface for Spark. + +This is the interface through which the user can get and set all Spark and Hadoop +configurations that are relevant to Spark SQL. When getting the value of a config, +this defaults to the value set in the underlying :class:`SparkContext`, if any. +""" +if not hasattr(self, "_conf"): +self._conf = RuntimeConfig(self._jsparkSession.conf()) +return self._conf + @since(2.0) def setConf(self, key, value): --- End diff -- ok, let's decide later --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12765#discussion_r61545881 --- Diff: python/pyspark/sql/session.py --- @@ -121,6 +121,19 @@ def newSession(self): """ return self.__class__(self._sc, self._jsparkSession.newSession()) +@property +@since(2.0) +def conf(self): +"""Runtime configuration interface for Spark. + +This is the interface through which the user can get and set all Spark and Hadoop +configurations that are relevant to Spark SQL. When getting the value of a config, +this defaults to the value set in the underlying :class:`SparkContext`, if any. +""" +if not hasattr(self, "_conf"): +self._conf = RuntimeConfig(self._jsparkSession.conf()) +return self._conf + @since(2.0) def setConf(self, key, value): --- End diff -- actually i was looking at this. initially runtimeconf has a lot of functions, but now it is pretty simple and no longer have a lot. I'm wondering if we should just remove that ... and have only SparkSession.setConf/getConf --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12765#discussion_r61544135 --- Diff: python/pyspark/sql/session.py --- @@ -121,6 +121,19 @@ def newSession(self): """ return self.__class__(self._sc, self._jsparkSession.newSession()) +@property +@since(2.0) +def conf(self): +"""Runtime configuration interface for Spark. + +This is the interface through which the user can get and set all Spark and Hadoop +configurations that are relevant to Spark SQL. When getting the value of a config, +this defaults to the value set in the underlying :class:`SparkContext`, if any. +""" +if not hasattr(self, "_conf"): +self._conf = RuntimeConfig(self._jsparkSession.conf()) +return self._conf + @since(2.0) def setConf(self, key, value): --- End diff -- we should probably remove it in Scala actually. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215632276 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57306/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215632274 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215632186 **[Test build #57306 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57306/consoleFull)** for PR 12765 at commit [`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215620953 **[Test build #57306 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57306/consoleFull)** for PR 12765 at commit [`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215620780 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/12765#discussion_r61529274 --- Diff: python/pyspark/sql/catalog.py --- @@ -0,0 +1,426 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from collections import namedtuple + +from pyspark import since +from pyspark.rdd import ignore_unicode_prefix +from pyspark.sql.dataframe import DataFrame +from pyspark.sql.functions import UserDefinedFunction +from pyspark.sql.types import IntegerType, StringType, StructType + + +Database = namedtuple("Database", "name description locationUri") +Table = namedtuple("Table", "name database description tableType isTemporary") +Column = namedtuple("Column", "name description dataType nullable isPartition isBucket") +Function = namedtuple("Function", "name description className isTemporary") + + +class Catalog(object): +"""User-facing catalog API, accessible through `SparkSession.catalog`. + +This is a thin wrapper around its Scala implementation org.apache.spark.sql.catalog.Catalog. +""" + +def __init__(self, sparkSession): +"""Create a new Catalog that wraps the underlying JVM object.""" +self._sparkSession = sparkSession +self._jsparkSession = sparkSession._jsparkSession +self._jcatalog = sparkSession._jsparkSession.catalog() + +@ignore_unicode_prefix +@since(2.0) +def currentDatabase(self): +"""Returns the current default database in this session. + +>>> spark.catalog._reset() +>>> spark.catalog.currentDatabase() +u'default' +""" +return self._jcatalog.currentDatabase() + +@ignore_unicode_prefix +@since(2.0) +def setCurrentDatabase(self, dbName): +"""Sets the current default database in this session. + +>>> spark.catalog._reset() +>>> spark.sql("CREATE DATABASE some_db") +DataFrame[] +>>> spark.catalog.setCurrentDatabase("some_db") +>>> spark.catalog.currentDatabase() +u'some_db' +>>> spark.catalog.setCurrentDatabase("does_not_exist") # doctest: +IGNORE_EXCEPTION_DETAIL +Traceback (most recent call last): +... +AnalysisException: ... +""" +return self._jcatalog.setCurrentDatabase(dbName) + +@ignore_unicode_prefix +@since(2.0) +def listDatabases(self): +"""Returns a list of databases available across all sessions. + +>>> spark.catalog._reset() +>>> [db.name for db in spark.catalog.listDatabases()] +[u'default'] +>>> spark.sql("CREATE DATABASE some_db") +DataFrame[] +>>> [db.name for db in spark.catalog.listDatabases()] +[u'default', u'some_db'] +""" +iter = self._jcatalog.listDatabases().toLocalIterator() +databases = [] +while iter.hasNext(): +jdb = iter.next() +databases.append(Database( +name=jdb.name(), +description=jdb.description(), +locationUri=jdb.locationUri())) +return databases + +@ignore_unicode_prefix +@since(2.0) +def listTables(self, dbName=None): +"""Returns a list of tables in the specified database. + +If no database is specified, the current database is used. +This includes all temporary tables. + +>>> spark.catalog._reset() +>>> spark.sql("CREATE DATABASE some_db") +DataFrame[] +>>> spark.catalog.listTables() +[] +>>> spark.catalog.listTables("some_db") +[] +>>> spark.createDataFrame([(1, 1)]).registerTempTable("my_temp_tab") +>>> spark.sql("CREATE TABLE my_tab1 (name STRING, age INT)") +DataFrame[] +>>> spark.s
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/12765#discussion_r61529260 --- Diff: python/pyspark/sql/session.py --- @@ -121,6 +121,19 @@ def newSession(self): """ return self.__class__(self._sc, self._jsparkSession.newSession()) +@property +@since(2.0) +def conf(self): +"""Runtime configuration interface for Spark. + +This is the interface through which the user can get and set all Spark and Hadoop +configurations that are relevant to Spark SQL. When getting the value of a config, +this defaults to the value set in the underlying :class:`SparkContext`, if any. +""" +if not hasattr(self, "_conf"): +self._conf = RuntimeConfig(self._jsparkSession.conf()) +return self._conf + @since(2.0) def setConf(self, key, value): --- End diff -- this one is also in scala, so I kept it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215609430 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57289/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215609428 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215609352 **[Test build #57289 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57289/consoleFull)** for PR 12765 at commit [`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215609337 **[Test build #2916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2916/consoleFull)** for PR 12765 at commit [`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215608783 **[Test build #2918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2918/consoleFull)** for PR 12765 at commit [`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12765#discussion_r61523711 --- Diff: python/pyspark/sql/catalog.py --- @@ -0,0 +1,426 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from collections import namedtuple + +from pyspark import since +from pyspark.rdd import ignore_unicode_prefix +from pyspark.sql.dataframe import DataFrame +from pyspark.sql.functions import UserDefinedFunction +from pyspark.sql.types import IntegerType, StringType, StructType + + +Database = namedtuple("Database", "name description locationUri") +Table = namedtuple("Table", "name database description tableType isTemporary") +Column = namedtuple("Column", "name description dataType nullable isPartition isBucket") +Function = namedtuple("Function", "name description className isTemporary") + + +class Catalog(object): +"""User-facing catalog API, accessible through `SparkSession.catalog`. + +This is a thin wrapper around its Scala implementation org.apache.spark.sql.catalog.Catalog. +""" + +def __init__(self, sparkSession): +"""Create a new Catalog that wraps the underlying JVM object.""" +self._sparkSession = sparkSession +self._jsparkSession = sparkSession._jsparkSession +self._jcatalog = sparkSession._jsparkSession.catalog() + +@ignore_unicode_prefix +@since(2.0) +def currentDatabase(self): +"""Returns the current default database in this session. + +>>> spark.catalog._reset() +>>> spark.catalog.currentDatabase() +u'default' +""" +return self._jcatalog.currentDatabase() + +@ignore_unicode_prefix +@since(2.0) +def setCurrentDatabase(self, dbName): +"""Sets the current default database in this session. + +>>> spark.catalog._reset() +>>> spark.sql("CREATE DATABASE some_db") +DataFrame[] +>>> spark.catalog.setCurrentDatabase("some_db") +>>> spark.catalog.currentDatabase() +u'some_db' +>>> spark.catalog.setCurrentDatabase("does_not_exist") # doctest: +IGNORE_EXCEPTION_DETAIL +Traceback (most recent call last): +... +AnalysisException: ... +""" +return self._jcatalog.setCurrentDatabase(dbName) + +@ignore_unicode_prefix +@since(2.0) +def listDatabases(self): +"""Returns a list of databases available across all sessions. + +>>> spark.catalog._reset() +>>> [db.name for db in spark.catalog.listDatabases()] +[u'default'] +>>> spark.sql("CREATE DATABASE some_db") +DataFrame[] +>>> [db.name for db in spark.catalog.listDatabases()] +[u'default', u'some_db'] +""" +iter = self._jcatalog.listDatabases().toLocalIterator() +databases = [] +while iter.hasNext(): +jdb = iter.next() +databases.append(Database( +name=jdb.name(), +description=jdb.description(), +locationUri=jdb.locationUri())) +return databases + +@ignore_unicode_prefix +@since(2.0) +def listTables(self, dbName=None): +"""Returns a list of tables in the specified database. + +If no database is specified, the current database is used. +This includes all temporary tables. + +>>> spark.catalog._reset() +>>> spark.sql("CREATE DATABASE some_db") +DataFrame[] +>>> spark.catalog.listTables() +[] +>>> spark.catalog.listTables("some_db") +[] +>>> spark.createDataFrame([(1, 1)]).registerTempTable("my_temp_tab") +>>> spark.sql("CREATE TABLE my_tab1 (name STRING, age INT)") +DataFrame[] +>>> spark.sql("
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12765#discussion_r61523686 --- Diff: python/pyspark/sql/catalog.py --- @@ -0,0 +1,426 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from collections import namedtuple + +from pyspark import since +from pyspark.rdd import ignore_unicode_prefix +from pyspark.sql.dataframe import DataFrame +from pyspark.sql.functions import UserDefinedFunction +from pyspark.sql.types import IntegerType, StringType, StructType + + +Database = namedtuple("Database", "name description locationUri") +Table = namedtuple("Table", "name database description tableType isTemporary") +Column = namedtuple("Column", "name description dataType nullable isPartition isBucket") +Function = namedtuple("Function", "name description className isTemporary") + + +class Catalog(object): +"""User-facing catalog API, accessible through `SparkSession.catalog`. + +This is a thin wrapper around its Scala implementation org.apache.spark.sql.catalog.Catalog. +""" + +def __init__(self, sparkSession): +"""Create a new Catalog that wraps the underlying JVM object.""" +self._sparkSession = sparkSession +self._jsparkSession = sparkSession._jsparkSession +self._jcatalog = sparkSession._jsparkSession.catalog() + +@ignore_unicode_prefix +@since(2.0) +def currentDatabase(self): +"""Returns the current default database in this session. + +>>> spark.catalog._reset() +>>> spark.catalog.currentDatabase() +u'default' +""" +return self._jcatalog.currentDatabase() + +@ignore_unicode_prefix +@since(2.0) +def setCurrentDatabase(self, dbName): +"""Sets the current default database in this session. + +>>> spark.catalog._reset() +>>> spark.sql("CREATE DATABASE some_db") +DataFrame[] +>>> spark.catalog.setCurrentDatabase("some_db") +>>> spark.catalog.currentDatabase() +u'some_db' +>>> spark.catalog.setCurrentDatabase("does_not_exist") # doctest: +IGNORE_EXCEPTION_DETAIL +Traceback (most recent call last): +... +AnalysisException: ... +""" +return self._jcatalog.setCurrentDatabase(dbName) + +@ignore_unicode_prefix +@since(2.0) +def listDatabases(self): +"""Returns a list of databases available across all sessions. + +>>> spark.catalog._reset() +>>> [db.name for db in spark.catalog.listDatabases()] +[u'default'] +>>> spark.sql("CREATE DATABASE some_db") +DataFrame[] +>>> [db.name for db in spark.catalog.listDatabases()] +[u'default', u'some_db'] +""" +iter = self._jcatalog.listDatabases().toLocalIterator() +databases = [] +while iter.hasNext(): +jdb = iter.next() +databases.append(Database( +name=jdb.name(), +description=jdb.description(), +locationUri=jdb.locationUri())) +return databases + +@ignore_unicode_prefix +@since(2.0) +def listTables(self, dbName=None): +"""Returns a list of tables in the specified database. + +If no database is specified, the current database is used. +This includes all temporary tables. + +>>> spark.catalog._reset() +>>> spark.sql("CREATE DATABASE some_db") +DataFrame[] +>>> spark.catalog.listTables() +[] +>>> spark.catalog.listTables("some_db") +[] +>>> spark.createDataFrame([(1, 1)]).registerTempTable("my_temp_tab") +>>> spark.sql("CREATE TABLE my_tab1 (name STRING, age INT)") +DataFrame[] +>>> spark.sql("
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12765#discussion_r61523630 --- Diff: python/pyspark/sql/session.py --- @@ -121,6 +121,19 @@ def newSession(self): """ return self.__class__(self._sc, self._jsparkSession.newSession()) +@property +@since(2.0) +def conf(self): +"""Runtime configuration interface for Spark. + +This is the interface through which the user can get and set all Spark and Hadoop +configurations that are relevant to Spark SQL. When getting the value of a config, +this defaults to the value set in the underlying :class:`SparkContext`, if any. +""" +if not hasattr(self, "_conf"): +self._conf = RuntimeConfig(self._jsparkSession.conf()) +return self._conf + @since(2.0) def setConf(self, key, value): --- End diff -- Since we removed the registerFunction, should we also remove this one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12765#discussion_r61523579 --- Diff: python/pyspark/sql/conf.py --- @@ -0,0 +1,114 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from pyspark import since +from pyspark.rdd import ignore_unicode_prefix + + +class RuntimeConfig(object): +"""User-facing configuration API, accessible through `SparkSession.conf`. + +Options set here are automatically propagated to the Hadoop configuration during I/O. +This a thin wrapper around its Scala implementation org.apache.spark.sql.RuntimeConfig. +""" + +def __init__(self, jconf): +"""Create a new RuntimeConfig that wraps the underlying JVM object.""" +self._jconf = jconf + +@ignore_unicode_prefix +@since(2.0) +def set(self, key, value): +"""Sets the given Spark runtime configuration property. + +>>> spark.conf.set("garble", "marble") +>>> spark.getConf("garble") +u'marble' +""" +self._jconf.set(key, value) + +@ignore_unicode_prefix +@since(2.0) +def get(self, key): +"""Returns the value of Spark runtime configuration property for the given key, +assuming it is set. + +>>> spark.setConf("bogo", "sipeo") +>>> spark.conf.get("bogo") +u'sipeo' +>>> spark.conf.get("definitely.not.set") # doctest: +IGNORE_EXCEPTION_DETAIL +Traceback (most recent call last): +... +Py4JJavaError: ... +""" +return self._jconf.get(key) + +@ignore_unicode_prefix +@since(2.0) +def getOption(self, key): --- End diff -- `get(self, key, default=None)` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215600883 cc @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215599870 **[Test build #2917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2917/consoleFull)** for PR 12765 at commit [`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215598507 **[Test build #2918 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2918/consoleFull)** for PR 12765 at commit [`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215598468 **[Test build #2917 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2917/consoleFull)** for PR 12765 at commit [`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215598432 **[Test build #2916 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2916/consoleFull)** for PR 12765 at commit [`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215598307 **[Test build #57289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57289/consoleFull)** for PR 12765 at commit [`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215584504 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57280/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215584501 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215584321 **[Test build #57280 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57280/consoleFull)** for PR 12765 at commit [`d32ee8c`](https://github.com/apache/spark/commit/d32ee8cdfab57ec61eab8a84bcac13462f40ffd4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215583865 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215583867 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57278/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215583680 **[Test build #57278 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57278/consoleFull)** for PR 12765 at commit [`a263f74`](https://github.com/apache/spark/commit/a263f7403fde1ff79655efbafee11daed5e81bac). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215565286 **[Test build #57280 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57280/consoleFull)** for PR 12765 at commit [`d32ee8c`](https://github.com/apache/spark/commit/d32ee8cdfab57ec61eab8a84bcac13462f40ffd4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215564024 **[Test build #57278 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57278/consoleFull)** for PR 12765 at commit [`a263f74`](https://github.com/apache/spark/commit/a263f7403fde1ff79655efbafee11daed5e81bac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org