[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/16454#discussion_r95846995 --- Diff: python/pyspark/sql/session.py --- @@ -161,8 +161,8 @@ def getOrCreate(self): with self._lock: from pyspark.context import SparkContext from pyspark.conf import SparkConf -session = SparkSession._instantiatedContext -if session is None: +session = SparkSession._instantiatedSession +if session is None or session._sc._jsc is None: --- End diff -- Yeah, I had to change to just checking whether _jsc was defined in our branch for the same reason. I'd prefer to add isStopped to Java and Python eventually, but that doesn't need to block this commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/16454#discussion_r95846835 --- Diff: python/pyspark/sql/session.py --- @@ -214,8 +214,12 @@ def __init__(self, sparkContext, jsparkSession=None): self._wrapped = SQLContext(self._sc, self, self._jwrapped) _monkey_patch_RDD(self) install_exception_handler() -if SparkSession._instantiatedContext is None: -SparkSession._instantiatedContext = self +# If we had an instantiated SparkSession attached with a SparkContext +# which is stopped now, we need to renew the instantiated SparkSession. +# Otherwise, we will use invalid SparkSession when we call Builder.getOrCreate. +if SparkSession._instantiatedSession is None \ +or SparkSession._instantiatedSession._sc._jsc is None: --- End diff -- If this can't tell whether the java context is stopped, then it should be fine to use this logic. +1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16454 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16454#discussion_r95729251 --- Diff: python/pyspark/sql/session.py --- @@ -214,8 +214,12 @@ def __init__(self, sparkContext, jsparkSession=None): self._wrapped = SQLContext(self._sc, self, self._jwrapped) _monkey_patch_RDD(self) install_exception_handler() -if SparkSession._instantiatedContext is None: -SparkSession._instantiatedContext = self +# If we had an instantiated SparkSession attached with a SparkContext +# which is stopped now, we need to renew the instantiated SparkSession. +# Otherwise, we will use invalid SparkSession when we call Builder.getOrCreate. +if SparkSession._instantiatedSession is None \ +or SparkSession._instantiatedSession._sc._jsc is None: --- End diff -- Well, as I tried, this would case test failed. `__init__` and `getOrCreate` are both legal paths. Once one calls `__init__` to get a `SparkSession` and other calls `getOrCreate` later, the second one will be the different session than the first one, if the logic to update `_instantiatedSession` is different. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16454#discussion_r95725013 --- Diff: python/pyspark/sql/session.py --- @@ -214,8 +214,12 @@ def __init__(self, sparkContext, jsparkSession=None): self._wrapped = SQLContext(self._sc, self, self._jwrapped) _monkey_patch_RDD(self) install_exception_handler() -if SparkSession._instantiatedContext is None: -SparkSession._instantiatedContext = self +# If we had an instantiated SparkSession attached with a SparkContext +# which is stopped now, we need to renew the instantiated SparkSession. +# Otherwise, we will use invalid SparkSession when we call Builder.getOrCreate. +if SparkSession._instantiatedSession is None \ +or SparkSession._instantiatedSession._sc._jsc is None: --- End diff -- Sounds good. I will do the change, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16454#discussion_r95724518 --- Diff: python/pyspark/sql/session.py --- @@ -161,8 +161,8 @@ def getOrCreate(self): with self._lock: from pyspark.context import SparkContext from pyspark.conf import SparkConf -session = SparkSession._instantiatedContext -if session is None: +session = SparkSession._instantiatedSession +if session is None or session._sc._jsc is None: --- End diff -- I can change it to `isStopped` way if others also prefer that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16454#discussion_r95724228 --- Diff: python/pyspark/sql/session.py --- @@ -161,8 +161,8 @@ def getOrCreate(self): with self._lock: from pyspark.context import SparkContext from pyspark.conf import SparkConf -session = SparkSession._instantiatedContext -if session is None: +session = SparkSession._instantiatedSession +if session is None or session._sc._jsc is None: --- End diff -- We can add `isStopped` to `JavaSparkContext`, but I will prefer to solve it in Python side if possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16454#discussion_r95724161 --- Diff: python/pyspark/sql/session.py --- @@ -161,8 +161,8 @@ def getOrCreate(self): with self._lock: from pyspark.context import SparkContext from pyspark.conf import SparkConf -session = SparkSession._instantiatedContext -if session is None: +session = SparkSession._instantiatedSession +if session is None or session._sc._jsc is None: --- End diff -- Actually `_jsc` is a `JavaSparkContext` instead of `SparkContext` and it doesn't have `isStopped`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/16454#discussion_r95259073 --- Diff: python/pyspark/sql/session.py --- @@ -214,8 +214,12 @@ def __init__(self, sparkContext, jsparkSession=None): self._wrapped = SQLContext(self._sc, self, self._jwrapped) _monkey_patch_RDD(self) install_exception_handler() -if SparkSession._instantiatedContext is None: -SparkSession._instantiatedContext = self +# If we had an instantiated SparkSession attached with a SparkContext +# which is stopped now, we need to renew the instantiated SparkSession. +# Otherwise, we will use invalid SparkSession when we call Builder.getOrCreate. +if SparkSession._instantiatedSession is None \ +or SparkSession._instantiatedSession._sc._jsc is None: --- End diff -- What about setting `SparkSession._instantiatedSession` to `None` in `getOrCreate` instead of duplicating the logic? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/16454#discussion_r95258453 --- Diff: python/pyspark/sql/session.py --- @@ -161,8 +161,8 @@ def getOrCreate(self): with self._lock: from pyspark.context import SparkContext from pyspark.conf import SparkConf -session = SparkSession._instantiatedContext -if session is None: +session = SparkSession._instantiatedSession +if session is None or session._sc._jsc is None: --- End diff -- #16519 adds `isStopped`, which I think is a better way of checking whether the Spark context is valid than just checking whether `_jsc` is defined because `_jsc` could be stopped. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16454#discussion_r95055690 --- Diff: python/pyspark/sql/tests.py --- @@ -1886,6 +1887,28 @@ def test_hivecontext(self): self.assertTrue(os.path.exists(metastore_path)) +class SQLTests2(ReusedPySparkTestCase): --- End diff -- `ReusedPySparkTestCase` launches a class's spark context. So I use it as the default spark context. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/16454#discussion_r95050517 --- Diff: python/pyspark/sql/tests.py --- @@ -1886,6 +1887,28 @@ def test_hivecontext(self): self.assertTrue(os.path.exists(metastore_path)) +class SQLTests2(ReusedPySparkTestCase): --- End diff -- Is there any particular reason this is built on a `ReusedPySparkTestCase`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16454#discussion_r94905140 --- Diff: python/pyspark/sql/session.py --- @@ -214,8 +214,14 @@ def __init__(self, sparkContext, jsparkSession=None): self._wrapped = SQLContext(self._sc, self, self._jwrapped) _monkey_patch_RDD(self) install_exception_handler() -if SparkSession._instantiatedContext is None: -SparkSession._instantiatedContext = self +if SparkSession._instantiatedSession is None: +SparkSession._instantiatedSession = self +else: +# If we had an instantiated SparkSession attached with a SparkContext +# which is stopped now, we need to renew the instantiated SparkSession. +# Otherwise, we will use invalid SparkSession when we call Builder.getOrCreate. +if SparkSession._instantiatedSession._sc._jsc is None: +SparkSession._instantiatedSession = self --- End diff -- `SparkSession._instantiatedSession._sc` is from the `sparkContext` passed into `__init__`. I think it should not be None. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16454#discussion_r94904472 --- Diff: python/pyspark/sql/session.py --- @@ -214,8 +214,14 @@ def __init__(self, sparkContext, jsparkSession=None): self._wrapped = SQLContext(self._sc, self, self._jwrapped) _monkey_patch_RDD(self) install_exception_handler() -if SparkSession._instantiatedContext is None: -SparkSession._instantiatedContext = self +if SparkSession._instantiatedSession is None: +SparkSession._instantiatedSession = self +else: +# If we had an instantiated SparkSession attached with a SparkContext +# which is stopped now, we need to renew the instantiated SparkSession. +# Otherwise, we will use invalid SparkSession when we call Builder.getOrCreate. +if SparkSession._instantiatedSession._sc._jsc is None: +SparkSession._instantiatedSession = self --- End diff -- Can SparkSession._instantiatedSession._sc is None? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16454#discussion_r94903491 --- Diff: python/pyspark/sql/session.py --- @@ -214,8 +214,14 @@ def __init__(self, sparkContext, jsparkSession=None): self._wrapped = SQLContext(self._sc, self, self._jwrapped) _monkey_patch_RDD(self) install_exception_handler() -if SparkSession._instantiatedContext is None: -SparkSession._instantiatedContext = self +if SparkSession._instantiatedSession is None: +SparkSession._instantiatedSession = self +else: +# If we had an instantiated SparkSession attached with a SparkContext +# which is stopped now, we need to renew the instantiated SparkSession. +# Otherwise, we will use invalid SparkSession when we call Builder.getOrCreate. +if SparkSession._instantiatedSession._sc._jsc is None: +SparkSession._instantiatedSession = self --- End diff -- yeah. sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
Github user vijoshi commented on a diff in the pull request: https://github.com/apache/spark/pull/16454#discussion_r94903274 --- Diff: python/pyspark/sql/session.py --- @@ -214,8 +214,14 @@ def __init__(self, sparkContext, jsparkSession=None): self._wrapped = SQLContext(self._sc, self, self._jwrapped) _monkey_patch_RDD(self) install_exception_handler() -if SparkSession._instantiatedContext is None: -SparkSession._instantiatedContext = self +if SparkSession._instantiatedSession is None: +SparkSession._instantiatedSession = self +else: +# If we had an instantiated SparkSession attached with a SparkContext +# which is stopped now, we need to renew the instantiated SparkSession. +# Otherwise, we will use invalid SparkSession when we call Builder.getOrCreate. +if SparkSession._instantiatedSession._sc._jsc is None: +SparkSession._instantiatedSession = self --- End diff -- Would this be simpler ? ``` if SparkSession._instantiatedSession is None \ or SparkSession._instantiatedSession._sc._jsc is None: SparkSession._instantiatedContext = self ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/16454 [SPARK-19055][SQL][PySpark] Fix SparkSession initialization when SparkContext is stopped ## What changes were proposed in this pull request? In SparkSession initialization, we store created the instance of SparkSession into a class variable _instantiatedContext. Next time we can use SparkSession.builder.getOrCreate() to retrieve the existing SparkSession instance. However, when the active SparkContext is stopped and we create another new SparkContext to use, the existing SparkSession is still associated with the stopped SparkContext. So the operations with this existing SparkSession will be failed. We need to detect such case in SparkSession and renew the class variable _instantiatedContext if needed. ## How was this patch tested? New test added in PySpark. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 fix-pyspark-sparksession Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16454.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16454 commit 80bba5ead0601f3ef4b05fff5391d07a61e06341 Author: Liang-Chi Hsieh Date: 2017-01-03T03:06:21Z Fix SparkSession initialization when previous SparkContext is stopped and new SparkContext is created. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org