[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21990


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-17 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r225992993
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
   SparkSession.clearDefaultSession()
 }
   }
+
+  /**
+   * Initialize extensions if the user has defined a configurator class in 
their SparkConf.
+   * This class will be applied to the extensions passed into this 
function.
+   */
+  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: 
SparkSessionExtensions) {
--- End diff --

Updated with replacement then :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r225819012
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
   SparkSession.clearDefaultSession()
 }
   }
+
+  /**
+   * Initialize extensions if the user has defined a configurator class in 
their SparkConf.
+   * This class will be applied to the extensions passed into this 
function.
+   */
+  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: 
SparkSessionExtensions) {
--- End diff --

Actually either way looks okay.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r225818067
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
   SparkSession.clearDefaultSession()
 }
   }
+
+  /**
+   * Initialize extensions if the user has defined a configurator class in 
their SparkConf.
+   * This class will be applied to the extensions passed into this 
function.
+   */
+  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: 
SparkSessionExtensions) {
--- End diff --

Eh .. I think it's okay to have a function and returns that updated 
extensions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-17 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r225811960
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
   SparkSession.clearDefaultSession()
 }
   }
+
+  /**
+   * Initialize extensions if the user has defined a configurator class in 
their SparkConf.
+   * This class will be applied to the extensions passed into this 
function.
+   */
+  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: 
SparkSessionExtensions) {
--- End diff --

I see, but in that case, we need to ensure that no injection of extensions 
is used in the default constructor to avoid initializing without extensions 
from the conf.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-17 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r225805019
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
   SparkSession.clearDefaultSession()
 }
   }
+
+  /**
+   * Initialize extensions if the user has defined a configurator class in 
their SparkConf.
+   * This class will be applied to the extensions passed into this 
function.
+   */
+  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: 
SparkSessionExtensions) {
--- End diff --

I am always a little nervous about having functions return objects they 
take in as parameters and then modify. Gives an impression to me that they are 
stateless. If you think that this is clearer I can make the change. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r225762148
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
   SparkSession.clearDefaultSession()
 }
   }
+
+  /**
+   * Initialize extensions if the user has defined a configurator class in 
their SparkConf.
+   * This class will be applied to the extensions passed into this 
function.
+   */
+  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: 
SparkSessionExtensions) {
--- End diff --

Oh, I see, moving to the default constructor was not a good idea.
How about the first suggestion?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-16 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r225613517
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
   SparkSession.clearDefaultSession()
 }
   }
+
+  /**
+   * Initialize extensions if the user has defined a configurator class in 
their SparkConf.
+   * This class will be applied to the extensions passed into this 
function.
+   */
+  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: 
SparkSessionExtensions) {
--- End diff --

It's difficult here since I'm attempting to cause the least change in 
behavior for the old code paths :(


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-16 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r225612270
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
   SparkSession.clearDefaultSession()
 }
   }
+
+  /**
+   * Initialize extensions if the user has defined a configurator class in 
their SparkConf.
+   * This class will be applied to the extensions passed into this 
function.
+   */
+  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: 
SparkSessionExtensions) {
--- End diff --

I thought about this and was worried then about multiple invocations of the 
extensions
Once every time the SparkSession is cloned


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-16 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r225610975
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
   SparkSession.clearDefaultSession()
 }
   }
+
+  /**
+   * Initialize extensions if the user has defined a configurator class in 
their SparkConf.
+   * This class will be applied to the extensions passed into this 
function.
+   */
+  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: 
SparkSessionExtensions) {
--- End diff --

The Default constructor of SparkSession?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-16 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r225608605
  
--- Diff: python/pyspark/sql/session.py ---
@@ -219,6 +219,7 @@ def __init__(self, sparkContext, jsparkSession=None):
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
 else:
 jsparkSession = self._jvm.SparkSession(self._jsc.sc())
+
--- End diff --

I'm addicted to whitespace apparently


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r225439067
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
   SparkSession.clearDefaultSession()
 }
   }
+
+  /**
+   * Initialize extensions if the user has defined a configurator class in 
their SparkConf.
+   * This class will be applied to the extensions passed into this 
function.
+   */
+  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: 
SparkSessionExtensions) {
--- End diff --

On second thoughts, we could move the method call to the top of the default 
constructor?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r225077270
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
   SparkSession.clearDefaultSession()
 }
   }
+
+  /**
+   * Initialize extensions if the user has defined a configurator class in 
their SparkConf.
+   * This class will be applied to the extensions passed into this 
function.
+   */
+  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: 
SparkSessionExtensions) {
--- End diff --

How about returning `SparkSessionExtensions` from this method, and 
modifying the secondary constructor of `SparkSession` as:

```scala
private[sql] def this(sc: SparkContext) {
  this(sc, None, None,
SparkSession.applyExtensionsFromConf(sc.getConf, new 
SparkSessionExtensions))
}
```

I'm a little worried whether the order we apply extensions might affect.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r224985166
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
   SparkSession.clearDefaultSession()
 }
   }
+
+  /**
+   * Initialize extensions if the user has defined a configurator class in 
their SparkConf.
+   * This class will be applied to the extensions passed into this 
function.
+   */
+  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: 
SparkSessionExtensions) {
+val extensionConfOption = 
conf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS)
--- End diff --

I think we can even only pass 
`conf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS)` as its argument instead of 
`SparkConf`, and name it `applyExtensions`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r224985023
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3563,6 +3563,48 @@ def 
test_query_execution_listener_on_collect_with_arrow(self):
 "The callback from the query execution listener should be 
called after 'toPandas'")
 
 
+class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):
--- End diff --

I think `SQLTestUtils` is not needed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r224984813
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -86,6 +86,7 @@ class SparkSession private(
 
   private[sql] def this(sc: SparkContext) {
 this(sc, None, None, new SparkSessionExtensions)
+SparkSession.applyExtensionsFromConf(sc.getConf, this.extensions)
--- End diff --

Let's add some comments why this is only here in this constructor. It might 
look weird why this constructor specifically requires to run 
`applyExtensionsFromConf` alone.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r224984761
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
   SparkSession.clearDefaultSession()
 }
   }
+
+  /**
+   * Initialize extensions if the user has defined a configurator class in 
their SparkConf.
+   * This class will be applied to the extensions passed into this 
function.
+   */
+  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: 
SparkSessionExtensions) {
--- End diff --

Let's make it `private`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r224984751
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -86,6 +86,7 @@ class SparkSession private(
 
   private[sql] def this(sc: SparkContext) {
--- End diff --

@RussellSpitzer, mind if I ask to note that this is currently not used in 
Scala code base but PySpark and tests in Scala side?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r224984675
  
--- Diff: python/pyspark/sql/session.py ---
@@ -219,6 +219,7 @@ def __init__(self, sparkContext, jsparkSession=None):
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
 else:
 jsparkSession = self._jvm.SparkSession(self._jsc.sc())
+
--- End diff --

Oh haha, let's get rid of this change


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-09-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r221436367
  
--- Diff: python/pyspark/sql/session.py ---
@@ -212,13 +212,17 @@ def __init__(self, sparkContext, jsparkSession=None):
 self._sc = sparkContext
 self._jsc = self._sc._jsc
 self._jvm = self._sc._jvm
+
 if jsparkSession is None:
 if self._jvm.SparkSession.getDefaultSession().isDefined() \
 and not 
self._jvm.SparkSession.getDefaultSession().get() \
 .sparkContext().isStopped():
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
 else:
-jsparkSession = self._jvm.SparkSession(self._jsc.sc())
+extensions = self._sc._jvm.org.apache.spark.sql\
--- End diff --

tiny nit: just for consistency `extensions = 
self._sc._jvm.org.apache.spark.sql\` -> `extensions = 
self._sc._jvm.org.apache.spark.sql \`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-09-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r221436324
  
--- Diff: python/pyspark/sql/session.py ---
@@ -212,13 +212,17 @@ def __init__(self, sparkContext, jsparkSession=None):
 self._sc = sparkContext
 self._jsc = self._sc._jsc
 self._jvm = self._sc._jvm
+
--- End diff --

really not a big deal but let's revert this newline to leave related 
changes only.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-09-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r221436261
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala ---
@@ -169,3 +178,29 @@ class SparkSessionExtensions {
 parserBuilders += builder
   }
 }
+
+object SparkSessionExtensions extends Logging {
+
+  /**
+   * Initialize extensions if the user has defined a configurator class in 
their SparkConf.
+   * This class will be applied to the extensions passed into this 
function.
+   */
+  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: 
SparkSessionExtensions) {
--- End diff --

Can you just put this in `SparkSession` object and make it `private[spark]`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-09-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r214572917
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3563,6 +3563,48 @@ def 
test_query_execution_listener_on_collect_with_arrow(self):
 "The callback from the query execution listener should be 
called after 'toPandas'")
 
 
+class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):
+# These tests are separate because it uses 'spark.sql.extensions' 
which is
+# static and immutable. This can't be set or unset, for example, via 
`spark.conf`.
+
+@classmethod
+def setUpClass(cls):
+import glob
+from pyspark.find_spark_home import _find_spark_home
+
+SPARK_HOME = _find_spark_home()
+filename_pattern = (
+"sql/core/target/scala-*/test-classes/org/apache/spark/sql/"
+"SparkSessionExtensionSuite.class")
+if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)):
+raise unittest.SkipTest(
+"'org.apache.spark.sql.SparkSessionExtensionSuite.' is not 
"
--- End diff --

nit: `SparkSessionExtensionSuite.` -> `SparkSessionExtensionSuite`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-08-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r213010408
  
--- Diff: python/pyspark/sql/session.py ---
@@ -218,7 +218,9 @@ def __init__(self, sparkContext, jsparkSession=None):
 .sparkContext().isStopped():
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
 else:
-jsparkSession = self._jvm.SparkSession(self._jsc.sc())
+jsparkSession = self._jvm.SparkSession.builder() \
--- End diff --

@RussellSpitzer, have you maybe had a chance to take a look and see if we 
can deduplicate some logics comparing to Scala's `getOrCreate`? I am suggesting 
this since now it looks the code path duplicates some logics there.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-08-17 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r210941883
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3563,6 +3563,51 @@ def 
test_query_execution_listener_on_collect_with_arrow(self):
 "The callback from the query execution listener should be 
called after 'toPandas'")
 
 
+class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):
+# These tests are separate because it uses 'spark.sql.extensions' 
which is
+# static and immutable. This can't be set or unset, for example, via 
`spark.conf`.
+
+@classmethod
+def setUpClass(cls):
+import glob
+from pyspark.find_spark_home import _find_spark_home
+
+SPARK_HOME = _find_spark_home()
+filename_pattern = (
+"sql/core/target/scala-*/test-classes/org/apache/spark/sql/"
+"SparkSessionExtensionSuite.class")
+if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)):
+raise unittest.SkipTest(
+"'org.apache.spark.sql.SparkSessionExtensionSuite.' is not 
"
+"available. Will skip the related tests.")
+
+# Note that 'spark.sql.extensions' is a static immutable 
configuration.
+cls.spark = SparkSession.builder \
+.master("local[4]") \
+.appName(cls.__name__) \
+.config(
+"spark.sql.extensions",
+"org.apache.spark.sql.MyExtensions") \
--- End diff --

Reduce, reuse, recycle :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-08-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r210940572
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3563,6 +3563,51 @@ def 
test_query_execution_listener_on_collect_with_arrow(self):
 "The callback from the query execution listener should be 
called after 'toPandas'")
 
 
+class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):
+# These tests are separate because it uses 'spark.sql.extensions' 
which is
+# static and immutable. This can't be set or unset, for example, via 
`spark.conf`.
+
+@classmethod
+def setUpClass(cls):
+import glob
+from pyspark.find_spark_home import _find_spark_home
+
+SPARK_HOME = _find_spark_home()
+filename_pattern = (
+"sql/core/target/scala-*/test-classes/org/apache/spark/sql/"
+"SparkSessionExtensionSuite.class")
+if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)):
+raise unittest.SkipTest(
+"'org.apache.spark.sql.SparkSessionExtensionSuite.' is not 
"
+"available. Will skip the related tests.")
+
+# Note that 'spark.sql.extensions' is a static immutable 
configuration.
+cls.spark = SparkSession.builder \
+.master("local[4]") \
+.appName(cls.__name__) \
+.config(
+"spark.sql.extensions",
+"org.apache.spark.sql.MyExtensions") \
--- End diff --

Oh sorry I thought you wrote that class for this test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-08-17 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r210909196
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3563,6 +3563,51 @@ def 
test_query_execution_listener_on_collect_with_arrow(self):
 "The callback from the query execution listener should be 
called after 'toPandas'")
 
 
+class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):
+# These tests are separate because it uses 'spark.sql.extensions' 
which is
+# static and immutable. This can't be set or unset, for example, via 
`spark.conf`.
+
+@classmethod
+def setUpClass(cls):
+import glob
+from pyspark.find_spark_home import _find_spark_home
+
+SPARK_HOME = _find_spark_home()
+filename_pattern = (
+"sql/core/target/scala-*/test-classes/org/apache/spark/sql/"
+"SparkSessionExtensionSuite.class")
+if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)):
+raise unittest.SkipTest(
+"'org.apache.spark.sql.SparkSessionExtensionSuite.' is not 
"
+"available. Will skip the related tests.")
+
+# Note that 'spark.sql.extensions' is a static immutable 
configuration.
+cls.spark = SparkSession.builder \
+.master("local[4]") \
+.appName(cls.__name__) \
+.config(
+"spark.sql.extensions",
+"org.apache.spark.sql.MyExtensions") \
+.getOrCreate()
+
+@classmethod
+def tearDownClass(cls):
+cls.spark.stop()
+
+def tearDown(self):
+self.spark._jvm.OnSuccessCall.clear()
--- End diff --

Sounds good to me, i'll take that out.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-08-17 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r210909083
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3563,6 +3563,51 @@ def 
test_query_execution_listener_on_collect_with_arrow(self):
 "The callback from the query execution listener should be 
called after 'toPandas'")
 
 
+class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):
+# These tests are separate because it uses 'spark.sql.extensions' 
which is
+# static and immutable. This can't be set or unset, for example, via 
`spark.conf`.
+
+@classmethod
+def setUpClass(cls):
+import glob
+from pyspark.find_spark_home import _find_spark_home
+
+SPARK_HOME = _find_spark_home()
+filename_pattern = (
+"sql/core/target/scala-*/test-classes/org/apache/spark/sql/"
+"SparkSessionExtensionSuite.class")
+if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)):
+raise unittest.SkipTest(
+"'org.apache.spark.sql.SparkSessionExtensionSuite.' is not 
"
+"available. Will skip the related tests.")
+
+# Note that 'spark.sql.extensions' is a static immutable 
configuration.
+cls.spark = SparkSession.builder \
+.master("local[4]") \
+.appName(cls.__name__) \
+.config(
+"spark.sql.extensions",
+"org.apache.spark.sql.MyExtensions") \
--- End diff --

I'm not sure what you mean here? This class already exists as part of the 
SparkSqlExtensions Suite and i'm just reusing with no changes. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-08-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r210790581
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3563,6 +3563,51 @@ def 
test_query_execution_listener_on_collect_with_arrow(self):
 "The callback from the query execution listener should be 
called after 'toPandas'")
 
 
+class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):
+# These tests are separate because it uses 'spark.sql.extensions' 
which is
+# static and immutable. This can't be set or unset, for example, via 
`spark.conf`.
+
+@classmethod
+def setUpClass(cls):
+import glob
+from pyspark.find_spark_home import _find_spark_home
+
+SPARK_HOME = _find_spark_home()
+filename_pattern = (
+"sql/core/target/scala-*/test-classes/org/apache/spark/sql/"
+"SparkSessionExtensionSuite.class")
+if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)):
+raise unittest.SkipTest(
+"'org.apache.spark.sql.SparkSessionExtensionSuite.' is not 
"
+"available. Will skip the related tests.")
+
+# Note that 'spark.sql.extensions' is a static immutable 
configuration.
+cls.spark = SparkSession.builder \
+.master("local[4]") \
+.appName(cls.__name__) \
+.config(
+"spark.sql.extensions",
+"org.apache.spark.sql.MyExtensions") \
+.getOrCreate()
+
+@classmethod
+def tearDownClass(cls):
+cls.spark.stop()
+
+def tearDown(self):
+self.spark._jvm.OnSuccessCall.clear()
--- End diff --

This wouldn't be needed since I did this for testing if the callback is 
called or not in the PR pointed out.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-08-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r210790531
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3563,6 +3563,51 @@ def 
test_query_execution_listener_on_collect_with_arrow(self):
 "The callback from the query execution listener should be 
called after 'toPandas'")
 
 
+class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):
+# These tests are separate because it uses 'spark.sql.extensions' 
which is
+# static and immutable. This can't be set or unset, for example, via 
`spark.conf`.
+
+@classmethod
+def setUpClass(cls):
+import glob
+from pyspark.find_spark_home import _find_spark_home
+
+SPARK_HOME = _find_spark_home()
+filename_pattern = (
+"sql/core/target/scala-*/test-classes/org/apache/spark/sql/"
+"SparkSessionExtensionSuite.class")
+if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)):
+raise unittest.SkipTest(
+"'org.apache.spark.sql.SparkSessionExtensionSuite.' is not 
"
+"available. Will skip the related tests.")
+
+# Note that 'spark.sql.extensions' is a static immutable 
configuration.
+cls.spark = SparkSession.builder \
+.master("local[4]") \
+.appName(cls.__name__) \
+.config(
+"spark.sql.extensions",
+"org.apache.spark.sql.MyExtensions") \
--- End diff --

@RussellSpitzer, I think you should push `MyExtensions` scala side code too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-08-15 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r210428804
  
--- Diff: python/pyspark/sql/session.py ---
@@ -218,7 +218,9 @@ def __init__(self, sparkContext, jsparkSession=None):
 .sparkContext().isStopped():
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
 else:
-jsparkSession = self._jvm.SparkSession(self._jsc.sc())
+jsparkSession = self._jvm.SparkSession.builder() \
+.sparkContext(self._jsc.sc()) \
+.getOrCreate()
--- End diff --

Yeah let me add in the test, and then I'll clear out all the python 
duplication of Scala code. I can make it more of a wrapper and less of a 
reimplementer.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-08-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r209507286
  
--- Diff: python/pyspark/sql/session.py ---
@@ -218,7 +218,9 @@ def __init__(self, sparkContext, jsparkSession=None):
 .sparkContext().isStopped():
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
 else:
-jsparkSession = self._jvm.SparkSession(self._jsc.sc())
+jsparkSession = self._jvm.SparkSession.builder() \
+.sparkContext(self._jsc.sc()) \
+.getOrCreate()
--- End diff --

@RussellSpitzer, mind checking the logic `getOrCreate` inside Scala side 
and deduplicate them here while we are here? Some logics for instance setting 
default session, etc. are duplicated Here in Python side and there in Scala 
side.

It would be nicer if we have some tests as well. `spark.sql.extensions` are 
static configuration, right? in that case, we could add a test, for example, 
please refer https://github.com/apache/spark/pull/21007. I added a test with 
static configuration before there.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-08-03 Thread RussellSpitzer
GitHub user RussellSpitzer opened a pull request:

https://github.com/apache/spark/pull/21990

[SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Master

## What changes were proposed in this pull request?

Previously Pyspark used the private constructor for SparkSession when
building that object. This resulted in a SparkSession without checking
the sql.extensions parameter for additional session extensions. To fix
this we instead use the Session.builder() path as SparkR uses, this
loads the extensions and allows their use in PySpark.

## How was this patch tested?

This was manually tested by passing a class to spark.sql.extensions and 
making sure it's included strategies appeared in the 
spark._jsparkSession.sessionState.planner.strategies list. We could add a 
automatic test but i'm not very familiar with the Pyspark Testing framework. 
But I would be glad to implement that if requested.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/RussellSpitzer/spark SPARK-25003-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21990.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21990


commit f790ae000ae1c3f4030162028186448d345e2984
Author: Russell Spitzer 
Date:   2018-08-03T16:04:00Z

[SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Previously Pyspark used the private constructor for SparkSession when
building that object. This resulted in a SparkSession without checking
the sql.extensions parameter for additional session extensions. To fix
this we instead use the Session.builder() path as SparkR uses, this
loads the extensions and allows their use in PySpark.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org