[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21990 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225992993 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging { SparkSession.clearDefaultSession() } } + + /** + * Initialize extensions if the user has defined a configurator class in their SparkConf. + * This class will be applied to the extensions passed into this function. + */ + private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) { --- End diff -- Updated with replacement then :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225819012 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging { SparkSession.clearDefaultSession() } } + + /** + * Initialize extensions if the user has defined a configurator class in their SparkConf. + * This class will be applied to the extensions passed into this function. + */ + private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) { --- End diff -- Actually either way looks okay. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225818067 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging { SparkSession.clearDefaultSession() } } + + /** + * Initialize extensions if the user has defined a configurator class in their SparkConf. + * This class will be applied to the extensions passed into this function. + */ + private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) { --- End diff -- Eh .. I think it's okay to have a function and returns that updated extensions. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225811960 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging { SparkSession.clearDefaultSession() } } + + /** + * Initialize extensions if the user has defined a configurator class in their SparkConf. + * This class will be applied to the extensions passed into this function. + */ + private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) { --- End diff -- I see, but in that case, we need to ensure that no injection of extensions is used in the default constructor to avoid initializing without extensions from the conf. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225805019 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging { SparkSession.clearDefaultSession() } } + + /** + * Initialize extensions if the user has defined a configurator class in their SparkConf. + * This class will be applied to the extensions passed into this function. + */ + private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) { --- End diff -- I am always a little nervous about having functions return objects they take in as parameters and then modify. Gives an impression to me that they are stateless. If you think that this is clearer I can make the change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225762148 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging { SparkSession.clearDefaultSession() } } + + /** + * Initialize extensions if the user has defined a configurator class in their SparkConf. + * This class will be applied to the extensions passed into this function. + */ + private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) { --- End diff -- Oh, I see, moving to the default constructor was not a good idea. How about the first suggestion? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225613517 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging { SparkSession.clearDefaultSession() } } + + /** + * Initialize extensions if the user has defined a configurator class in their SparkConf. + * This class will be applied to the extensions passed into this function. + */ + private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) { --- End diff -- It's difficult here since I'm attempting to cause the least change in behavior for the old code paths :( --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225612270 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging { SparkSession.clearDefaultSession() } } + + /** + * Initialize extensions if the user has defined a configurator class in their SparkConf. + * This class will be applied to the extensions passed into this function. + */ + private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) { --- End diff -- I thought about this and was worried then about multiple invocations of the extensions Once every time the SparkSession is cloned --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225610975 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging { SparkSession.clearDefaultSession() } } + + /** + * Initialize extensions if the user has defined a configurator class in their SparkConf. + * This class will be applied to the extensions passed into this function. + */ + private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) { --- End diff -- The Default constructor of SparkSession? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225608605 --- Diff: python/pyspark/sql/session.py --- @@ -219,6 +219,7 @@ def __init__(self, sparkContext, jsparkSession=None): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() else: jsparkSession = self._jvm.SparkSession(self._jsc.sc()) + --- End diff -- I'm addicted to whitespace apparently --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225439067 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging { SparkSession.clearDefaultSession() } } + + /** + * Initialize extensions if the user has defined a configurator class in their SparkConf. + * This class will be applied to the extensions passed into this function. + */ + private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) { --- End diff -- On second thoughts, we could move the method call to the top of the default constructor? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225077270 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging { SparkSession.clearDefaultSession() } } + + /** + * Initialize extensions if the user has defined a configurator class in their SparkConf. + * This class will be applied to the extensions passed into this function. + */ + private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) { --- End diff -- How about returning `SparkSessionExtensions` from this method, and modifying the secondary constructor of `SparkSession` as: ```scala private[sql] def this(sc: SparkContext) { this(sc, None, None, SparkSession.applyExtensionsFromConf(sc.getConf, new SparkSessionExtensions)) } ``` I'm a little worried whether the order we apply extensions might affect. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r224985166 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging { SparkSession.clearDefaultSession() } } + + /** + * Initialize extensions if the user has defined a configurator class in their SparkConf. + * This class will be applied to the extensions passed into this function. + */ + private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) { +val extensionConfOption = conf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS) --- End diff -- I think we can even only pass `conf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS)` as its argument instead of `SparkConf`, and name it `applyExtensions`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r224985023 --- Diff: python/pyspark/sql/tests.py --- @@ -3563,6 +3563,48 @@ def test_query_execution_listener_on_collect_with_arrow(self): "The callback from the query execution listener should be called after 'toPandas'") +class SparkExtensionsTest(unittest.TestCase, SQLTestUtils): --- End diff -- I think `SQLTestUtils` is not needed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r224984813 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -86,6 +86,7 @@ class SparkSession private( private[sql] def this(sc: SparkContext) { this(sc, None, None, new SparkSessionExtensions) +SparkSession.applyExtensionsFromConf(sc.getConf, this.extensions) --- End diff -- Let's add some comments why this is only here in this constructor. It might look weird why this constructor specifically requires to run `applyExtensionsFromConf` alone. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r224984761 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging { SparkSession.clearDefaultSession() } } + + /** + * Initialize extensions if the user has defined a configurator class in their SparkConf. + * This class will be applied to the extensions passed into this function. + */ + private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) { --- End diff -- Let's make it `private` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r224984751 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -86,6 +86,7 @@ class SparkSession private( private[sql] def this(sc: SparkContext) { --- End diff -- @RussellSpitzer, mind if I ask to note that this is currently not used in Scala code base but PySpark and tests in Scala side? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r224984675 --- Diff: python/pyspark/sql/session.py --- @@ -219,6 +219,7 @@ def __init__(self, sparkContext, jsparkSession=None): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() else: jsparkSession = self._jvm.SparkSession(self._jsc.sc()) + --- End diff -- Oh haha, let's get rid of this change --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r221436367 --- Diff: python/pyspark/sql/session.py --- @@ -212,13 +212,17 @@ def __init__(self, sparkContext, jsparkSession=None): self._sc = sparkContext self._jsc = self._sc._jsc self._jvm = self._sc._jvm + if jsparkSession is None: if self._jvm.SparkSession.getDefaultSession().isDefined() \ and not self._jvm.SparkSession.getDefaultSession().get() \ .sparkContext().isStopped(): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() else: -jsparkSession = self._jvm.SparkSession(self._jsc.sc()) +extensions = self._sc._jvm.org.apache.spark.sql\ --- End diff -- tiny nit: just for consistency `extensions = self._sc._jvm.org.apache.spark.sql\` -> `extensions = self._sc._jvm.org.apache.spark.sql \` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r221436324 --- Diff: python/pyspark/sql/session.py --- @@ -212,13 +212,17 @@ def __init__(self, sparkContext, jsparkSession=None): self._sc = sparkContext self._jsc = self._sc._jsc self._jvm = self._sc._jvm + --- End diff -- really not a big deal but let's revert this newline to leave related changes only. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r221436261 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala --- @@ -169,3 +178,29 @@ class SparkSessionExtensions { parserBuilders += builder } } + +object SparkSessionExtensions extends Logging { + + /** + * Initialize extensions if the user has defined a configurator class in their SparkConf. + * This class will be applied to the extensions passed into this function. + */ + private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) { --- End diff -- Can you just put this in `SparkSession` object and make it `private[spark]`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r214572917 --- Diff: python/pyspark/sql/tests.py --- @@ -3563,6 +3563,48 @@ def test_query_execution_listener_on_collect_with_arrow(self): "The callback from the query execution listener should be called after 'toPandas'") +class SparkExtensionsTest(unittest.TestCase, SQLTestUtils): +# These tests are separate because it uses 'spark.sql.extensions' which is +# static and immutable. This can't be set or unset, for example, via `spark.conf`. + +@classmethod +def setUpClass(cls): +import glob +from pyspark.find_spark_home import _find_spark_home + +SPARK_HOME = _find_spark_home() +filename_pattern = ( +"sql/core/target/scala-*/test-classes/org/apache/spark/sql/" +"SparkSessionExtensionSuite.class") +if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)): +raise unittest.SkipTest( +"'org.apache.spark.sql.SparkSessionExtensionSuite.' is not " --- End diff -- nit: `SparkSessionExtensionSuite.` -> `SparkSessionExtensionSuite` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r213010408 --- Diff: python/pyspark/sql/session.py --- @@ -218,7 +218,9 @@ def __init__(self, sparkContext, jsparkSession=None): .sparkContext().isStopped(): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() else: -jsparkSession = self._jvm.SparkSession(self._jsc.sc()) +jsparkSession = self._jvm.SparkSession.builder() \ --- End diff -- @RussellSpitzer, have you maybe had a chance to take a look and see if we can deduplicate some logics comparing to Scala's `getOrCreate`? I am suggesting this since now it looks the code path duplicates some logics there. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r210941883 --- Diff: python/pyspark/sql/tests.py --- @@ -3563,6 +3563,51 @@ def test_query_execution_listener_on_collect_with_arrow(self): "The callback from the query execution listener should be called after 'toPandas'") +class SparkExtensionsTest(unittest.TestCase, SQLTestUtils): +# These tests are separate because it uses 'spark.sql.extensions' which is +# static and immutable. This can't be set or unset, for example, via `spark.conf`. + +@classmethod +def setUpClass(cls): +import glob +from pyspark.find_spark_home import _find_spark_home + +SPARK_HOME = _find_spark_home() +filename_pattern = ( +"sql/core/target/scala-*/test-classes/org/apache/spark/sql/" +"SparkSessionExtensionSuite.class") +if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)): +raise unittest.SkipTest( +"'org.apache.spark.sql.SparkSessionExtensionSuite.' is not " +"available. Will skip the related tests.") + +# Note that 'spark.sql.extensions' is a static immutable configuration. +cls.spark = SparkSession.builder \ +.master("local[4]") \ +.appName(cls.__name__) \ +.config( +"spark.sql.extensions", +"org.apache.spark.sql.MyExtensions") \ --- End diff -- Reduce, reuse, recycle :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r210940572 --- Diff: python/pyspark/sql/tests.py --- @@ -3563,6 +3563,51 @@ def test_query_execution_listener_on_collect_with_arrow(self): "The callback from the query execution listener should be called after 'toPandas'") +class SparkExtensionsTest(unittest.TestCase, SQLTestUtils): +# These tests are separate because it uses 'spark.sql.extensions' which is +# static and immutable. This can't be set or unset, for example, via `spark.conf`. + +@classmethod +def setUpClass(cls): +import glob +from pyspark.find_spark_home import _find_spark_home + +SPARK_HOME = _find_spark_home() +filename_pattern = ( +"sql/core/target/scala-*/test-classes/org/apache/spark/sql/" +"SparkSessionExtensionSuite.class") +if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)): +raise unittest.SkipTest( +"'org.apache.spark.sql.SparkSessionExtensionSuite.' is not " +"available. Will skip the related tests.") + +# Note that 'spark.sql.extensions' is a static immutable configuration. +cls.spark = SparkSession.builder \ +.master("local[4]") \ +.appName(cls.__name__) \ +.config( +"spark.sql.extensions", +"org.apache.spark.sql.MyExtensions") \ --- End diff -- Oh sorry I thought you wrote that class for this test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r210909196 --- Diff: python/pyspark/sql/tests.py --- @@ -3563,6 +3563,51 @@ def test_query_execution_listener_on_collect_with_arrow(self): "The callback from the query execution listener should be called after 'toPandas'") +class SparkExtensionsTest(unittest.TestCase, SQLTestUtils): +# These tests are separate because it uses 'spark.sql.extensions' which is +# static and immutable. This can't be set or unset, for example, via `spark.conf`. + +@classmethod +def setUpClass(cls): +import glob +from pyspark.find_spark_home import _find_spark_home + +SPARK_HOME = _find_spark_home() +filename_pattern = ( +"sql/core/target/scala-*/test-classes/org/apache/spark/sql/" +"SparkSessionExtensionSuite.class") +if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)): +raise unittest.SkipTest( +"'org.apache.spark.sql.SparkSessionExtensionSuite.' is not " +"available. Will skip the related tests.") + +# Note that 'spark.sql.extensions' is a static immutable configuration. +cls.spark = SparkSession.builder \ +.master("local[4]") \ +.appName(cls.__name__) \ +.config( +"spark.sql.extensions", +"org.apache.spark.sql.MyExtensions") \ +.getOrCreate() + +@classmethod +def tearDownClass(cls): +cls.spark.stop() + +def tearDown(self): +self.spark._jvm.OnSuccessCall.clear() --- End diff -- Sounds good to me, i'll take that out. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r210909083 --- Diff: python/pyspark/sql/tests.py --- @@ -3563,6 +3563,51 @@ def test_query_execution_listener_on_collect_with_arrow(self): "The callback from the query execution listener should be called after 'toPandas'") +class SparkExtensionsTest(unittest.TestCase, SQLTestUtils): +# These tests are separate because it uses 'spark.sql.extensions' which is +# static and immutable. This can't be set or unset, for example, via `spark.conf`. + +@classmethod +def setUpClass(cls): +import glob +from pyspark.find_spark_home import _find_spark_home + +SPARK_HOME = _find_spark_home() +filename_pattern = ( +"sql/core/target/scala-*/test-classes/org/apache/spark/sql/" +"SparkSessionExtensionSuite.class") +if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)): +raise unittest.SkipTest( +"'org.apache.spark.sql.SparkSessionExtensionSuite.' is not " +"available. Will skip the related tests.") + +# Note that 'spark.sql.extensions' is a static immutable configuration. +cls.spark = SparkSession.builder \ +.master("local[4]") \ +.appName(cls.__name__) \ +.config( +"spark.sql.extensions", +"org.apache.spark.sql.MyExtensions") \ --- End diff -- I'm not sure what you mean here? This class already exists as part of the SparkSqlExtensions Suite and i'm just reusing with no changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r210790581 --- Diff: python/pyspark/sql/tests.py --- @@ -3563,6 +3563,51 @@ def test_query_execution_listener_on_collect_with_arrow(self): "The callback from the query execution listener should be called after 'toPandas'") +class SparkExtensionsTest(unittest.TestCase, SQLTestUtils): +# These tests are separate because it uses 'spark.sql.extensions' which is +# static and immutable. This can't be set or unset, for example, via `spark.conf`. + +@classmethod +def setUpClass(cls): +import glob +from pyspark.find_spark_home import _find_spark_home + +SPARK_HOME = _find_spark_home() +filename_pattern = ( +"sql/core/target/scala-*/test-classes/org/apache/spark/sql/" +"SparkSessionExtensionSuite.class") +if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)): +raise unittest.SkipTest( +"'org.apache.spark.sql.SparkSessionExtensionSuite.' is not " +"available. Will skip the related tests.") + +# Note that 'spark.sql.extensions' is a static immutable configuration. +cls.spark = SparkSession.builder \ +.master("local[4]") \ +.appName(cls.__name__) \ +.config( +"spark.sql.extensions", +"org.apache.spark.sql.MyExtensions") \ +.getOrCreate() + +@classmethod +def tearDownClass(cls): +cls.spark.stop() + +def tearDown(self): +self.spark._jvm.OnSuccessCall.clear() --- End diff -- This wouldn't be needed since I did this for testing if the callback is called or not in the PR pointed out. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r210790531 --- Diff: python/pyspark/sql/tests.py --- @@ -3563,6 +3563,51 @@ def test_query_execution_listener_on_collect_with_arrow(self): "The callback from the query execution listener should be called after 'toPandas'") +class SparkExtensionsTest(unittest.TestCase, SQLTestUtils): +# These tests are separate because it uses 'spark.sql.extensions' which is +# static and immutable. This can't be set or unset, for example, via `spark.conf`. + +@classmethod +def setUpClass(cls): +import glob +from pyspark.find_spark_home import _find_spark_home + +SPARK_HOME = _find_spark_home() +filename_pattern = ( +"sql/core/target/scala-*/test-classes/org/apache/spark/sql/" +"SparkSessionExtensionSuite.class") +if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)): +raise unittest.SkipTest( +"'org.apache.spark.sql.SparkSessionExtensionSuite.' is not " +"available. Will skip the related tests.") + +# Note that 'spark.sql.extensions' is a static immutable configuration. +cls.spark = SparkSession.builder \ +.master("local[4]") \ +.appName(cls.__name__) \ +.config( +"spark.sql.extensions", +"org.apache.spark.sql.MyExtensions") \ --- End diff -- @RussellSpitzer, I think you should push `MyExtensions` scala side code too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r210428804 --- Diff: python/pyspark/sql/session.py --- @@ -218,7 +218,9 @@ def __init__(self, sparkContext, jsparkSession=None): .sparkContext().isStopped(): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() else: -jsparkSession = self._jvm.SparkSession(self._jsc.sc()) +jsparkSession = self._jvm.SparkSession.builder() \ +.sparkContext(self._jsc.sc()) \ +.getOrCreate() --- End diff -- Yeah let me add in the test, and then I'll clear out all the python duplication of Scala code. I can make it more of a wrapper and less of a reimplementer. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r209507286 --- Diff: python/pyspark/sql/session.py --- @@ -218,7 +218,9 @@ def __init__(self, sparkContext, jsparkSession=None): .sparkContext().isStopped(): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() else: -jsparkSession = self._jvm.SparkSession(self._jsc.sc()) +jsparkSession = self._jvm.SparkSession.builder() \ +.sparkContext(self._jsc.sc()) \ +.getOrCreate() --- End diff -- @RussellSpitzer, mind checking the logic `getOrCreate` inside Scala side and deduplicate them here while we are here? Some logics for instance setting default session, etc. are duplicated Here in Python side and there in Scala side. It would be nicer if we have some tests as well. `spark.sql.extensions` are static configuration, right? in that case, we could add a test, for example, please refer https://github.com/apache/spark/pull/21007. I added a test with static configuration before there. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
GitHub user RussellSpitzer opened a pull request: https://github.com/apache/spark/pull/21990 [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark Master ## What changes were proposed in this pull request? Previously Pyspark used the private constructor for SparkSession when building that object. This resulted in a SparkSession without checking the sql.extensions parameter for additional session extensions. To fix this we instead use the Session.builder() path as SparkR uses, this loads the extensions and allows their use in PySpark. ## How was this patch tested? This was manually tested by passing a class to spark.sql.extensions and making sure it's included strategies appeared in the spark._jsparkSession.sessionState.planner.strategies list. We could add a automatic test but i'm not very familiar with the Pyspark Testing framework. But I would be glad to implement that if requested. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/RussellSpitzer/spark SPARK-25003-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21990.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21990 commit f790ae000ae1c3f4030162028186448d345e2984 Author: Russell Spitzer Date: 2018-08-03T16:04:00Z [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark Previously Pyspark used the private constructor for SparkSession when building that object. This resulted in a SparkSession without checking the sql.extensions parameter for additional session extensions. To fix this we instead use the Session.builder() path as SparkR uses, this loads the extensions and allows their use in PySpark. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org