[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23034: [SPARK-26035][PYTHON] Break large streaming/tests.py fil...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23034 Thank you @BryanCutler. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of primitive typ...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/23054 [SPARK-26085][SQL] Key attribute of primitive type under typed aggregation should be named as "key" too ## What changes were proposed in this pull request? When doing typed aggregation on a Dataset, for complex key type, the key attribute is named as "key". But for primitive type, the key attribute is named as "value". This key attribute should also be named as "key" for primitive type. ## How was this patch tested? Added test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-26085 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23054.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23054 commit c7bbe91519aec116ae2c2f449f518f59cc49c7c0 Author: Liang-Chi Hsieh Date: 2018-11-16T01:52:12Z Named key attribute for primitive type as "key". --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/23054 cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][CORE][WEBUI]Aggregated metrics table doesn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23038 **[Test build #98894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98894/testReport)** for PR 23038 at commit [`805ebb8`](https://github.com/apache/spark/commit/805ebb8e6b103cbc0688da64ec27841a1491039f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23031: [SPARK-26060][SQL] Track SparkConf entries and make SET ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23031 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5066/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23031: [SPARK-26060][SQL] Track SparkConf entries and make SET ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23031 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23026: [SPARK-25960][k8s] Support subpath mounting with Kuberne...
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/23026 > > if such a list exists it should be the same list that triggers regular tests. > > I defer that to @shaneknapp no, @vanzin is right. i'll update that tomorrow. @vanzin for historical knowledge: once i get spark ported to ubuntu (literally down to one or two troublesome builds! such closeness!), the k8s prb will be merged in to the regular spark prb. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/te...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23056#discussion_r234080468 --- Diff: python/pyspark/mllib/tests/test_linalg.py --- @@ -0,0 +1,642 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import sys +import array as pyarray + +from numpy import array, array_equal, zeros, arange, tile, ones, inf +from numpy import sum as array_sum + +if sys.version_info[:2] <= (2, 6): +try: +import unittest2 as unittest +except ImportError: +sys.stderr.write('Please install unittest2 to test with Python 2.6 or earlier') +sys.exit(1) +else: +import unittest + +import pyspark.ml.linalg as newlinalg +from pyspark.mllib.linalg import Vector, SparseVector, DenseVector, VectorUDT, _convert_to_vector, \ +DenseMatrix, SparseMatrix, Vectors, Matrices, MatrixUDT +from pyspark.mllib.regression import LabeledPoint +from pyspark.testing.mllibutils import make_serializer, MLlibTestCase + +_have_scipy = False +try: +import scipy.sparse +_have_scipy = True +except: +# No SciPy, but that's okay, we'll skip those tests +pass + + +ser = make_serializer() + + +def _squared_distance(a, b): +if isinstance(a, Vector): +return a.squared_distance(b) +else: +return b.squared_distance(a) + + +class VectorTests(MLlibTestCase): + +def _test_serialize(self, v): +self.assertEqual(v, ser.loads(ser.dumps(v))) +jvec = self.sc._jvm.org.apache.spark.mllib.api.python.SerDe.loads(bytearray(ser.dumps(v))) +nv = ser.loads(bytes(self.sc._jvm.org.apache.spark.mllib.api.python.SerDe.dumps(jvec))) +self.assertEqual(v, nv) +vs = [v] * 100 +jvecs = self.sc._jvm.org.apache.spark.mllib.api.python.SerDe.loads(bytearray(ser.dumps(vs))) +nvs = ser.loads(bytes(self.sc._jvm.org.apache.spark.mllib.api.python.SerDe.dumps(jvecs))) +self.assertEqual(vs, nvs) + +def test_serialize(self): +self._test_serialize(DenseVector(range(10))) +self._test_serialize(DenseVector(array([1., 2., 3., 4.]))) +self._test_serialize(DenseVector(pyarray.array('d', range(10 +self._test_serialize(SparseVector(4, {1: 1, 3: 2})) +self._test_serialize(SparseVector(3, {})) +self._test_serialize(DenseMatrix(2, 3, range(6))) +sm1 = SparseMatrix( +3, 4, [0, 2, 2, 4, 4], [1, 2, 1, 2], [1.0, 2.0, 4.0, 5.0]) +self._test_serialize(sm1) + +def test_dot(self): +sv = SparseVector(4, {1: 1, 3: 2}) +dv = DenseVector(array([1., 2., 3., 4.])) +lst = DenseVector([1, 2, 3, 4]) +mat = array([[1., 2., 3., 4.], + [1., 2., 3., 4.], + [1., 2., 3., 4.], + [1., 2., 3., 4.]]) +arr = pyarray.array('d', [0, 1, 2, 3]) +self.assertEqual(10.0, sv.dot(dv)) +self.assertTrue(array_equal(array([3., 6., 9., 12.]), sv.dot(mat))) +self.assertEqual(30.0, dv.dot(dv)) +self.assertTrue(array_equal(array([10., 20., 30., 40.]), dv.dot(mat))) +self.assertEqual(30.0, lst.dot(dv)) +self.assertTrue(array_equal(array([10., 20., 30., 40.]), lst.dot(mat))) +self.assertEqual(7.0, sv.dot(arr)) + +def test_squared_distance(self): +sv = SparseVector(4, {1: 1, 3: 2}) +dv = DenseVector(array([1., 2., 3., 4.])) +lst = DenseVector([4, 3, 2, 1]) +lst1 = [4, 3, 2, 1] +arr = pyarray.array('d', [0, 2, 1, 3]) +narr = array([0, 2, 1, 3]) +self.assertEqual(15.0, _squared_distance(sv, dv)) +self.assertEqual(25.0, _squared_distance(sv, lst)) +self.assertEqual(20.0, _squared_distance(dv, lst)) +self.assertEqual(15.0, _squared_distance(dv, sv)) +
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23056 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234080578 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- I don't think this is necessary. If `resource` can't be imported for any reason, then memory will not be limited in python. But the JVM side shouldn't be what determines whether that happens. The JVM should do everything the same way -- even requesting memory from schedulers like YARN because that space should still be allocated as python memory, even if python can't self-limit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23056 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5068/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/te...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23056#discussion_r234080249 --- Diff: python/pyspark/testing/mllibutils.py --- @@ -0,0 +1,44 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import sys + +if sys.version_info[:2] <= (2, 6): +try: +import unittest2 as unittest +except ImportError: +sys.stderr.write('Please install unittest2 to test with Python 2.6 or earlier') +sys.exit(1) +else: +import unittest --- End diff -- @BryanCutler, actually I remove this because we dropped 2.6 support while we are here. Im pretty sure we can just import unittest. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234081475 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- I see. I think the point of view is a bit different. What I was trying to do is that: we declare this configuration is not supported on Windows, meaning we disable this configuration on Windows from the start, JVM side - because it's JVM to launch Python workers. So, I was trying to leave the control to JVM. > It seems brittle to disable this on the JVM side and rely on it here. Can we also set a flag in the ImportError case and also check that here? However, in a way, It's a bit odd to say it's brittle because we're already relying on that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/23049 Hi @vanzin , thanks for pointing it out! I have updated the script and PR description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23046: [SPARK-23207][SQL][FOLLOW-UP] Use `SQLConf.get.en...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23046#discussion_r234088968 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -280,7 +280,7 @@ object ShuffleExchangeExec { } // The comparator for comparing row hashcode, which should always be Integer. val prefixComparator = PrefixComparators.LONG - val canUseRadixSort = SparkEnv.get.conf.get(SQLConf.RADIX_SORT_ENABLED) + val canUseRadixSort = SQLConf.get.enableRadixSort --- End diff -- It's a small bug fix, so no need to backport to all the branches. I think 2.4 is good enough --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23045: [SPARK-26071][SQL] disallow map as map key
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23045 **[Test build #98901 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98901/testReport)** for PR 23045 at commit [`574308e`](https://github.com/apache/spark/commit/574308e8f4c23f9549c647178709c7c85d4d2fc7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23034: [SPARK-26035][PYTHON] Break large streaming/tests.py fil...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/23034 > Also, @BryanCutler, I think we can talk about locations of testing/...util.py later when we finished to split the tests. Moving utils would probably cause less conflicts and should be good enough to separately discuss if that's a worry, and should be changed. Sounds good, working on MLlib right now. Hopefully have a PR up soon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23052: [SPARK-26081][SQL] Prevent empty files for empty ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23052#discussion_r234062564 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala --- @@ -174,13 +174,18 @@ private[csv] class CsvOutputWriter( context: TaskAttemptContext, params: CSVOptions) extends OutputWriter with Logging { - private val charset = Charset.forName(params.charset) + private var univocityGenerator: Option[UnivocityGenerator] = None - private val writer = CodecStreams.createOutputStreamWriter(context, new Path(path), charset) - - private val gen = new UnivocityGenerator(dataSchema, writer, params) + override def write(row: InternalRow): Unit = { +val gen = univocityGenerator.getOrElse { --- End diff -- Also, one thing we should not forget about is, CSV _could_ have headers even if the records are empty. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20303: [SPARK-23128][SQL] A new approach to do adaptive executi...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/20303 @cloud-fan @gatorsmile , are you ready to start reviewing this? I can bring this update to date. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23038: [SPARK-25451][CORE][WEBUI]Aggregated metrics tabl...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23038#discussion_r234072070 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala --- @@ -63,6 +63,7 @@ case class ApplicationAttemptInfo private[spark]( class ExecutorStageSummary private[spark]( val taskTime : Long, +val activeTasks: Int, --- End diff -- Hi @vanzin , I have modified based your comment. Kindly review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][CORE][WEBUI]Aggregated metrics table doesn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23038 **[Test build #98893 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98893/testReport)** for PR 23038 at commit [`0d92185`](https://github.com/apache/spark/commit/0d921852045fdca3a528fa807fbd229076b52746). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23046: [SPARK-23207][SQL][FOLLOW-UP] Use `SQLConf.get.en...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23046#discussion_r234073703 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -280,7 +280,7 @@ object ShuffleExchangeExec { } // The comparator for comparing row hashcode, which should always be Integer. val prefixComparator = PrefixComparators.LONG - val canUseRadixSort = SparkEnv.get.conf.get(SQLConf.RADIX_SORT_ENABLED) + val canUseRadixSort = SQLConf.get.enableRadixSort --- End diff -- Yes .. I don't mind it but was just thinking that we don't necessarily backport to all the branches if there's any concern. I will leave it to you guys as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23026: [SPARK-25960][k8s] Support subpath mounting with Kuberne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23026 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5062/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23026: [SPARK-25960][k8s] Support subpath mounting with Kuberne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23026 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/te...
GitHub user BryanCutler opened a pull request: https://github.com/apache/spark/pull/23056 [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py file into smaller files ## What changes were proposed in this pull request? This PR breaks down the large mllib/tests.py file that contains all Python MLlib unit tests into several smaller test files to be easier to read and maintain. The tests are broken down as follows: ``` pyspark âââ __init__.py ... âââ mllib â âââ __init__.py ... â âââ tests â â âââ __init__.py â â âââ test_algorithms.py â â âââ test_feature.py â â âââ test_linalg.py â â âââ test_stat.py â â âââ test_streaming_algorithms.py â â âââ test_util.py ... âââ testing ... â âââ mllibutils.py ... ``` ## How was this patch tested? Ran tests manually by module to ensure test count was the same, and ran `python/run-tests --modules=pyspark-mllib` to verify all passing with Python 2.7 and Python 3.6. Also installed scipy to include optional tests in test_linalg. You can merge this pull request into a Git repository by running: $ git pull https://github.com/BryanCutler/spark python-test-breakup-mllib-SPARK-26034 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23056.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23056 commit 2759521df7f2dffc9ddb9379e0b1dac6721da366 Author: Bryan Cutler Date: 2018-11-16T03:01:22Z separated mllib tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/23056 Dist by line count: ``` 313 ./test_algorithms.py 201 ./test_feature.py 642 ./test_linalg.py 197 ./test_stat.py 523 ./test_streaming_algorithms.py 115 ./test_util.py ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23037: [SPARK-26083][k8s] Add Copy pyspark into corresponding d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23037 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5063/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23037: [SPARK-26083][k8s] Add Copy pyspark into corresponding d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23037 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.me...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/23055 Thanks for fixing this so quickly, @HyukjinKwon! I'd like a couple of changes, but overall it is going in the right direction. We should also plan on porting this to the 2.4 branch when it is committed since it is a regression. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23056 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23056 **[Test build #98898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98898/testReport)** for PR 23056 at commit [`2759521`](https://github.com/apache/spark/commit/2759521df7f2dffc9ddb9379e0b1dac6721da366). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22309 adding @liancheng BTW. IIRC, he took a look for this one before and abandoned the change (fix me if I'm wrongly remembering this). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23044: [SPARK-26073][SQL][FOLLOW-UP] remove invalid comment as ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23044 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/te...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/23056#discussion_r234093063 --- Diff: python/pyspark/testing/mllibutils.py --- @@ -0,0 +1,44 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import sys + +if sys.version_info[:2] <= (2, 6): +try: +import unittest2 as unittest +except ImportError: +sys.stderr.write('Please install unittest2 to test with Python 2.6 or earlier') +sys.exit(1) +else: +import unittest --- End diff -- Yeah, I wondered about that but thought it might be better to do in a followup --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23054 **[Test build #98891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98891/testReport)** for PR 23054 at commit [`c7bbe91`](https://github.com/apache/spark/commit/c7bbe91519aec116ae2c2f449f518f59cc49c7c0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23055: [SPARK-26080][SQL] Disable 'spark.executor.pyspark.memor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23055 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23055: [SPARK-26080][SQL] Disable 'spark.executor.pyspark.memor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23055 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5065/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23025: [SPARK-26024][SQL]: Update documentation for repa...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/23025#discussion_r234071565 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2813,6 +2819,11 @@ class Dataset[T] private[sql]( * When no explicit sort order is specified, "ascending nulls first" is assumed. * Note, the rows are not sorted in each partition of the resulting Dataset. * + * [SPARK-26024] Note that due to performance reasons this method uses sampling to + * estimate the ranges. Hence, the output may not be consistent, since sampling can return + * different values. The sample size can be controlled by setting the value of the parameter + * {{spark.sql.execution.rangeExchange.sampleSizePerPartition}}. --- End diff -- ``` `spark.sql.execution.rangeExchange.sampleSizePerPartition` ```. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22598: [SPARK-25501][SS] Add kafka delegation token support.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22598 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22598: [SPARK-25501][SS] Add kafka delegation token support.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22598 **[Test build #98890 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98890/testReport)** for PR 22598 at commit [`2a0cdb7`](https://github.com/apache/spark/commit/2a0cdb7f397abdc8ce411e2f5c08cf8029676e90). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22598: [SPARK-25501][SS] Add kafka delegation token support.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22598 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98890/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23046: [SPARK-23207][SQL][FOLLOW-UP] Use `SQLConf.get.en...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/23046#discussion_r234073072 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -280,7 +280,7 @@ object ShuffleExchangeExec { } // The comparator for comparing row hashcode, which should always be Integer. val prefixComparator = PrefixComparators.LONG - val canUseRadixSort = SparkEnv.get.conf.get(SQLConf.RADIX_SORT_ENABLED) + val canUseRadixSort = SQLConf.get.enableRadixSort --- End diff -- Ah, yes, to be exact, if users specified the config to `SparkConf` before Spark ran, it could be read. I'd leave which branch we should backport to to you and other reviewers. @jiangxb1987 @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23031: [SPARK-26060][SQL] Track SparkConf entries and make SET ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23031 **[Test build #98896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98896/testReport)** for PR 23031 at commit [`336a331`](https://github.com/apache/spark/commit/336a331fdc817566c7fd09e5b36d5de24379c5b6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23041: [SPARK-26069][TESTS]Fix flaky test: RpcIntegrationSuite....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23041 **[Test build #4427 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4427/testReport)** for PR 23041 at commit [`6bebcb5`](https://github.com/apache/spark/commit/6bebcb5e004ed4b434c550d26ed1a922d13e0446). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23049 **[Test build #98899 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98899/testReport)** for PR 23049 at commit [`daf5e33`](https://github.com/apache/spark/commit/daf5e33f14f28fa28e85a703fbd3acc08075fd1b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23049 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23049 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5070/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r234085471 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -373,6 +383,32 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), path :: Nil) + case t if isValueClass(t) => +val (_, underlyingType) = getUnderlyingParameterOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = getUnerasedClassNameFromType(t) +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +// Nested value class is treated as its underlying type +// because the compiler will convert value class in the schema to +// its underlying type. +// However, for value class that is top-level or array element, +// if it is used as another type (e.g. as its parent trait or generic), +// the compiler keeps the class so we must provide an instance of the +// class too. In other cases, the compiler will handle wrapping/unwrapping +// for us automatically. +val arg = deserializerFor(underlyingType, path, newTypePath, Some(t)) +val isCollectionElement = lastType.exists { lt => + lt <:< localTypeOf[Array[_]] || lt <:< localTypeOf[Seq[_]] --- End diff -- I added the support for Map --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23046: [SPARK-23207][SQL][FOLLOW-UP] Use `SQLConf.get.en...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23046 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23054 makes sense to me. This is a behavior change right? Shall we write a migration guide? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23054 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23054 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5064/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23054 **[Test build #98891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98891/testReport)** for PR 23054 at commit [`c7bbe91`](https://github.com/apache/spark/commit/c7bbe91519aec116ae2c2f449f518f59cc49c7c0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23056 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23056 **[Test build #98897 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98897/testReport)** for PR 23056 at commit [`2759521`](https://github.com/apache/spark/commit/2759521df7f2dffc9ddb9379e0b1dac6721da366). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23056 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98897/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234080290 --- Diff: python/pyspark/worker.py --- @@ -268,9 +272,11 @@ def main(infile, outfile): # set up memory limits memory_limit_mb = int(os.environ.get('PYSPARK_EXECUTOR_MEMORY_MB', "-1")) -total_memory = resource.RLIMIT_AS -try: -if memory_limit_mb > 0: +# 'PYSPARK_EXECUTOR_MEMORY_MB' should be undefined on Windows because it depends on +# resource package which is a Unix specific package. +if memory_limit_mb > 0: --- End diff -- It seems brittle to disable this on the JVM side and rely on it here. Can we also set a flag in the ImportError case and also check that here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23049 **[Test build #98900 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98900/testReport)** for PR 23049 at commit [`3269862`](https://github.com/apache/spark/commit/3269862c0b80bb7c546e9d45fd5fd4aa17aa1c7e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234086569 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- > JVM could set the request This is handled in JVM so it wouldn't break. `worker` itself is strongly coupled to JVM. You mean that case when the client is in Windows machine and it uses a Unix-based clusters, right? I think this is what the fix already does - the `PythonRunner`s already are created at executor side and it wouldn't affect when the client is on Windows. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23042: [SPARK-26070][SQL] add rule for implicit type coe...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23042#discussion_r234091858 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -138,6 +138,11 @@ object TypeCoercion { case (DateType, TimestampType) => if (conf.compareDateTimestampInTimestamp) Some(TimestampType) else Some(StringType) +// to support a popular use case of tables using Decimal(X, 0) for long IDs instead of strings +// see SPARK-26070 for more details +case (n: DecimalType, s: StringType) if n.scale == 0 => Some(DecimalType(n.precision, n.scale)) --- End diff -- CC @gatorsmile @mgaido91 I think it's time to look at the SQL standard and other mainstream databases, and see how shall we update the type coercions rules with safe mode. What do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23044: [SPARK-26073][SQL][FOLLOW-UP] remove invalid comm...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23044 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23054 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/23054 Ok. Let me update migration guide. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23054 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98891/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23045: [SPARK-26071][SQL] disallow map as map key
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23045 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23045: [SPARK-26071][SQL] disallow map as map key
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23045 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5071/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23054 **[Test build #98902 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98902/testReport)** for PR 23054 at commit [`42e32ad`](https://github.com/apache/spark/commit/42e32adda2da3717161fe5f8aa40febc1f32465e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23046: [SPARK-23207][SQL][FOLLOW-UP] Use `SQLConf.get.en...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23046#discussion_r234063905 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -280,7 +280,7 @@ object ShuffleExchangeExec { } // The comparator for comparing row hashcode, which should always be Integer. val prefixComparator = PrefixComparators.LONG - val canUseRadixSort = SparkEnv.get.conf.get(SQLConf.RADIX_SORT_ENABLED) + val canUseRadixSort = SQLConf.get.enableRadixSort --- End diff -- @ueshin, BTW, for clarification, it does read the configuration but does not respect when the configuration is given to the session, right? I think we don't need to backport this through all other branches. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23037: [SPARK-26083][k8s] Add Copy pyspark into corresponding d...
Github user AzureQ commented on the issue: https://github.com/apache/spark/pull/23037 > > This is fine, but please file a bug. > > Okay, as such, @AzureQ could you add an integration test to `ClientModeTestsSuite` Sure --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23025: [SPARK-26024][SQL]: Update documentation for repa...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/23025#discussion_r234071213 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2813,6 +2819,11 @@ class Dataset[T] private[sql]( * When no explicit sort order is specified, "ascending nulls first" is assumed. * Note, the rows are not sorted in each partition of the resulting Dataset. * + * [SPARK-26024] Note that due to performance reasons this method uses sampling to --- End diff -- We can drop `[SPARK-26024]` here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][SQL] Disable 'spark.executor.pyspar...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/23055 [SPARK-26080][SQL] Disable 'spark.executor.pyspark.memory' always on Windows ## What changes were proposed in this pull request? `resource` package is a Unit specific package. See https://docs.python.org/2/library/resource.html and https://docs.python.org/3/library/resource.html. Note that we document Windows support: > Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). This should be backported into branch-2.4 to restore Windows support in Spark 2.4.1. ## How was this patch tested? Manually mocking the changed logics. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-26080 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23055.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23055 commit 2d3315a7dab429abc4d9ef5ed7f8f5484e8421f1 Author: hyukjinkwon Date: 2018-11-16T01:46:31Z Disable 'spark.executor.pyspark.memory' on Windows always --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23055: [SPARK-26080][SQL] Disable 'spark.executor.pyspark.memor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23055 **[Test build #98892 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98892/testReport)** for PR 23055 at commit [`2d3315a`](https://github.com/apache/spark/commit/2d3315a7dab429abc4d9ef5ed7f8f5484e8421f1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23055: [SPARK-26080][SQL] Disable 'spark.executor.pyspark.memor...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23055 cc @rdblue, @vanzin and @haydenjeune --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][CORE][WEBUI]Aggregated metrics table doesn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23038 **[Test build #98895 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98895/testReport)** for PR 23038 at commit [`7c3a80b`](https://github.com/apache/spark/commit/7c3a80bce0a45131091ce11e80a939e9de6ebf50). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23056 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23056 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5067/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/23056 cc @HyukjinKwon @squito --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23056 **[Test build #98897 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98897/testReport)** for PR 23056 at commit [`2759521`](https://github.com/apache/spark/commit/2759521df7f2dffc9ddb9379e0b1dac6721da366). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23049 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23049 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5069/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234084002 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- I mean that it is brittle to try to use `resource` if the JVM has set the property. You handle the `ImportError`, but the JVM could set the request and Python would break again. I think that this should not be entirely disabled on Windows. Resource requests to YARN or other schedulers should include this memory. The only feature that should be disabled is the resource limiting on the python side. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23046: [SPARK-23207][SQL][FOLLOW-UP] Use `SQLConf.get.enableRad...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23046 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23054 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5072/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23054 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23043 IIUC, we discussed handling `+0.0` and `-0.0` before in another PR. @srowen do you remember the previous discussion? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23044: [SPARK-26073][SQL][FOLLOW-UP] remove invalid comment as ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23044 LGTM, pending Jenkins --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/23043 @kiszk This spun out of https://issues.apache.org/jira/browse/SPARK-24834 and https://github.com/apache/spark/pull/21794 ; is that what you may be thinking of? I'm not aware of others. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23050: [SPARK-26079][sql] Ensure listener event delivery in Str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23050 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5061/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23050: [SPARK-26079][sql] Ensure listener event delivery in Str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23050 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23049 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23049 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98879/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23049 **[Test build #98876 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98876/testReport)** for PR 23049 at commit [`bf94264`](https://github.com/apache/spark/commit/bf94264a3e037e00a7b7111a677467de980071c9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23040 **[Test build #98878 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98878/testReport)** for PR 23040 at commit [`fa7af44`](https://github.com/apache/spark/commit/fa7af44abcd8ee95c956506c06badd83af067a03). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...
Github user rezasafi commented on the issue: https://github.com/apache/spark/pull/22612 @squito @mccheah @dhruve Let me know if there are more comments or this can be merged. I appreciate it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23031: [SPARK-26060][SQL] Track SparkConf entries and make SET ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23031 **[Test build #98896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98896/testReport)** for PR 23031 at commit [`336a331`](https://github.com/apache/spark/commit/336a331fdc817566c7fd09e5b36d5de24379c5b6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23025: [SPARK-26024][SQL]: Update documentation for repa...
Github user JulienPeloton commented on a diff in the pull request: https://github.com/apache/spark/pull/23025#discussion_r234099956 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2813,6 +2819,11 @@ class Dataset[T] private[sql]( * When no explicit sort order is specified, "ascending nulls first" is assumed. * Note, the rows are not sorted in each partition of the resulting Dataset. * + * [SPARK-26024] Note that due to performance reasons this method uses sampling to + * estimate the ranges. Hence, the output may not be consistent, since sampling can return + * different values. The sample size can be controlled by setting the value of the parameter + * {{spark.sql.execution.rangeExchange.sampleSizePerPartition}}. --- End diff -- Thanks. Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23025: [SPARK-26024][SQL]: Update documentation for repa...
Github user JulienPeloton commented on a diff in the pull request: https://github.com/apache/spark/pull/23025#discussion_r234099934 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2813,6 +2819,11 @@ class Dataset[T] private[sql]( * When no explicit sort order is specified, "ascending nulls first" is assumed. * Note, the rows are not sorted in each partition of the resulting Dataset. * + * [SPARK-26024] Note that due to performance reasons this method uses sampling to --- End diff -- Thanks. Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23047: [BACKPORT][SPARK-25883][SQL][MINOR] Override meth...
Github user gengliangwang closed the pull request at: https://github.com/apache/spark/pull/23047 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][CORE][WEBUI]Aggregated metrics table doesn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23038 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org