[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-12 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/16454#discussion_r95846995
  
--- Diff: python/pyspark/sql/session.py ---
@@ -161,8 +161,8 @@ def getOrCreate(self):
 with self._lock:
 from pyspark.context import SparkContext
 from pyspark.conf import SparkConf
-session = SparkSession._instantiatedContext
-if session is None:
+session = SparkSession._instantiatedSession
+if session is None or session._sc._jsc is None:
--- End diff --

Yeah, I had to change to just checking whether _jsc was defined in our 
branch for the same reason. I'd prefer to add isStopped to Java and Python 
eventually, but that doesn't need to block this commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-12 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/16454#discussion_r95846835
  
--- Diff: python/pyspark/sql/session.py ---
@@ -214,8 +214,12 @@ def __init__(self, sparkContext, jsparkSession=None):
 self._wrapped = SQLContext(self._sc, self, self._jwrapped)
 _monkey_patch_RDD(self)
 install_exception_handler()
-if SparkSession._instantiatedContext is None:
-SparkSession._instantiatedContext = self
+# If we had an instantiated SparkSession attached with a 
SparkContext
+# which is stopped now, we need to renew the instantiated 
SparkSession.
+# Otherwise, we will use invalid SparkSession when we call 
Builder.getOrCreate.
+if SparkSession._instantiatedSession is None \
+or SparkSession._instantiatedSession._sc._jsc is None:
--- End diff --

If this can't tell whether the java context is stopped, then it should be 
fine to use this logic. +1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16454


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16454#discussion_r95729251
  
--- Diff: python/pyspark/sql/session.py ---
@@ -214,8 +214,12 @@ def __init__(self, sparkContext, jsparkSession=None):
 self._wrapped = SQLContext(self._sc, self, self._jwrapped)
 _monkey_patch_RDD(self)
 install_exception_handler()
-if SparkSession._instantiatedContext is None:
-SparkSession._instantiatedContext = self
+# If we had an instantiated SparkSession attached with a 
SparkContext
+# which is stopped now, we need to renew the instantiated 
SparkSession.
+# Otherwise, we will use invalid SparkSession when we call 
Builder.getOrCreate.
+if SparkSession._instantiatedSession is None \
+or SparkSession._instantiatedSession._sc._jsc is None:
--- End diff --

Well, as I tried, this would case test failed.

`__init__` and `getOrCreate` are both legal paths. Once one calls 
`__init__` to get a `SparkSession`  and other calls `getOrCreate` later, the 
second one will be the different session than the first one, if the logic to 
update `_instantiatedSession` is different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16454#discussion_r95725013
  
--- Diff: python/pyspark/sql/session.py ---
@@ -214,8 +214,12 @@ def __init__(self, sparkContext, jsparkSession=None):
 self._wrapped = SQLContext(self._sc, self, self._jwrapped)
 _monkey_patch_RDD(self)
 install_exception_handler()
-if SparkSession._instantiatedContext is None:
-SparkSession._instantiatedContext = self
+# If we had an instantiated SparkSession attached with a 
SparkContext
+# which is stopped now, we need to renew the instantiated 
SparkSession.
+# Otherwise, we will use invalid SparkSession when we call 
Builder.getOrCreate.
+if SparkSession._instantiatedSession is None \
+or SparkSession._instantiatedSession._sc._jsc is None:
--- End diff --

Sounds good. I will do the change, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16454#discussion_r95724518
  
--- Diff: python/pyspark/sql/session.py ---
@@ -161,8 +161,8 @@ def getOrCreate(self):
 with self._lock:
 from pyspark.context import SparkContext
 from pyspark.conf import SparkConf
-session = SparkSession._instantiatedContext
-if session is None:
+session = SparkSession._instantiatedSession
+if session is None or session._sc._jsc is None:
--- End diff --

I can change it to `isStopped` way if others also prefer that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16454#discussion_r95724228
  
--- Diff: python/pyspark/sql/session.py ---
@@ -161,8 +161,8 @@ def getOrCreate(self):
 with self._lock:
 from pyspark.context import SparkContext
 from pyspark.conf import SparkConf
-session = SparkSession._instantiatedContext
-if session is None:
+session = SparkSession._instantiatedSession
+if session is None or session._sc._jsc is None:
--- End diff --

We can add `isStopped` to `JavaSparkContext`, but I will prefer to solve it 
in Python side if possible.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16454#discussion_r95724161
  
--- Diff: python/pyspark/sql/session.py ---
@@ -161,8 +161,8 @@ def getOrCreate(self):
 with self._lock:
 from pyspark.context import SparkContext
 from pyspark.conf import SparkConf
-session = SparkSession._instantiatedContext
-if session is None:
+session = SparkSession._instantiatedSession
+if session is None or session._sc._jsc is None:
--- End diff --

Actually `_jsc` is a `JavaSparkContext` instead of `SparkContext` and it 
doesn't have `isStopped`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-09 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/16454#discussion_r95259073
  
--- Diff: python/pyspark/sql/session.py ---
@@ -214,8 +214,12 @@ def __init__(self, sparkContext, jsparkSession=None):
 self._wrapped = SQLContext(self._sc, self, self._jwrapped)
 _monkey_patch_RDD(self)
 install_exception_handler()
-if SparkSession._instantiatedContext is None:
-SparkSession._instantiatedContext = self
+# If we had an instantiated SparkSession attached with a 
SparkContext
+# which is stopped now, we need to renew the instantiated 
SparkSession.
+# Otherwise, we will use invalid SparkSession when we call 
Builder.getOrCreate.
+if SparkSession._instantiatedSession is None \
+or SparkSession._instantiatedSession._sc._jsc is None:
--- End diff --

What about setting `SparkSession._instantiatedSession` to `None` in 
`getOrCreate` instead of duplicating the logic?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-09 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/16454#discussion_r95258453
  
--- Diff: python/pyspark/sql/session.py ---
@@ -161,8 +161,8 @@ def getOrCreate(self):
 with self._lock:
 from pyspark.context import SparkContext
 from pyspark.conf import SparkConf
-session = SparkSession._instantiatedContext
-if session is None:
+session = SparkSession._instantiatedSession
+if session is None or session._sc._jsc is None:
--- End diff --

#16519 adds `isStopped`, which I think is a better way of checking whether 
the Spark context is valid than just checking whether `_jsc` is defined because 
`_jsc` could be stopped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16454#discussion_r95055690
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -1886,6 +1887,28 @@ def test_hivecontext(self):
 self.assertTrue(os.path.exists(metastore_path))
 
 
+class SQLTests2(ReusedPySparkTestCase):
--- End diff --

`ReusedPySparkTestCase` launches a class's spark context. So I use it as 
the default spark context.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-06 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16454#discussion_r95050517
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -1886,6 +1887,28 @@ def test_hivecontext(self):
 self.assertTrue(os.path.exists(metastore_path))
 
 
+class SQLTests2(ReusedPySparkTestCase):
--- End diff --

Is there any particular reason this is built on a `ReusedPySparkTestCase`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16454#discussion_r94905140
  
--- Diff: python/pyspark/sql/session.py ---
@@ -214,8 +214,14 @@ def __init__(self, sparkContext, jsparkSession=None):
 self._wrapped = SQLContext(self._sc, self, self._jwrapped)
 _monkey_patch_RDD(self)
 install_exception_handler()
-if SparkSession._instantiatedContext is None:
-SparkSession._instantiatedContext = self
+if SparkSession._instantiatedSession is None:
+SparkSession._instantiatedSession = self
+else:
+# If we had an instantiated SparkSession attached with a 
SparkContext
+# which is stopped now, we need to renew the instantiated 
SparkSession.
+# Otherwise, we will use invalid SparkSession when we call 
Builder.getOrCreate.
+if SparkSession._instantiatedSession._sc._jsc is None:
+SparkSession._instantiatedSession = self
--- End diff --

`SparkSession._instantiatedSession._sc` is from the `sparkContext` passed 
into `__init__`. I think it should not be None.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-05 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16454#discussion_r94904472
  
--- Diff: python/pyspark/sql/session.py ---
@@ -214,8 +214,14 @@ def __init__(self, sparkContext, jsparkSession=None):
 self._wrapped = SQLContext(self._sc, self, self._jwrapped)
 _monkey_patch_RDD(self)
 install_exception_handler()
-if SparkSession._instantiatedContext is None:
-SparkSession._instantiatedContext = self
+if SparkSession._instantiatedSession is None:
+SparkSession._instantiatedSession = self
+else:
+# If we had an instantiated SparkSession attached with a 
SparkContext
+# which is stopped now, we need to renew the instantiated 
SparkSession.
+# Otherwise, we will use invalid SparkSession when we call 
Builder.getOrCreate.
+if SparkSession._instantiatedSession._sc._jsc is None:
+SparkSession._instantiatedSession = self
--- End diff --

Can SparkSession._instantiatedSession._sc is None?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16454#discussion_r94903491
  
--- Diff: python/pyspark/sql/session.py ---
@@ -214,8 +214,14 @@ def __init__(self, sparkContext, jsparkSession=None):
 self._wrapped = SQLContext(self._sc, self, self._jwrapped)
 _monkey_patch_RDD(self)
 install_exception_handler()
-if SparkSession._instantiatedContext is None:
-SparkSession._instantiatedContext = self
+if SparkSession._instantiatedSession is None:
+SparkSession._instantiatedSession = self
+else:
+# If we had an instantiated SparkSession attached with a 
SparkContext
+# which is stopped now, we need to renew the instantiated 
SparkSession.
+# Otherwise, we will use invalid SparkSession when we call 
Builder.getOrCreate.
+if SparkSession._instantiatedSession._sc._jsc is None:
+SparkSession._instantiatedSession = self
--- End diff --

yeah. sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-05 Thread vijoshi
Github user vijoshi commented on a diff in the pull request:

https://github.com/apache/spark/pull/16454#discussion_r94903274
  
--- Diff: python/pyspark/sql/session.py ---
@@ -214,8 +214,14 @@ def __init__(self, sparkContext, jsparkSession=None):
 self._wrapped = SQLContext(self._sc, self, self._jwrapped)
 _monkey_patch_RDD(self)
 install_exception_handler()
-if SparkSession._instantiatedContext is None:
-SparkSession._instantiatedContext = self
+if SparkSession._instantiatedSession is None:
+SparkSession._instantiatedSession = self
+else:
+# If we had an instantiated SparkSession attached with a 
SparkContext
+# which is stopped now, we need to renew the instantiated 
SparkSession.
+# Otherwise, we will use invalid SparkSession when we call 
Builder.getOrCreate.
+if SparkSession._instantiatedSession._sc._jsc is None:
+SparkSession._instantiatedSession = self
--- End diff --

Would this be simpler ?
``` 
if SparkSession._instantiatedSession is None \
or SparkSession._instantiatedSession._sc._jsc is None:
SparkSession._instantiatedContext = self
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-02 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/16454

[SPARK-19055][SQL][PySpark] Fix SparkSession initialization when 
SparkContext is stopped

## What changes were proposed in this pull request?



In SparkSession initialization, we store created the instance of 
SparkSession into a class variable _instantiatedContext. Next time we can use 
SparkSession.builder.getOrCreate() to retrieve the existing SparkSession 
instance.

However, when the active SparkContext is stopped and we create another new 
SparkContext to use, the existing SparkSession is still associated with the 
stopped SparkContext. So the operations with this existing SparkSession will be 
failed.

We need to detect such case in SparkSession and renew the class variable 
_instantiatedContext if needed.

## How was this patch tested?

New test added in PySpark.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 fix-pyspark-sparksession

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16454.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16454


commit 80bba5ead0601f3ef4b05fff5391d07a61e06341
Author: Liang-Chi Hsieh 
Date:   2017-01-03T03:06:21Z

Fix SparkSession initialization when previous SparkContext is stopped and 
new SparkContext is created.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org