[spark] branch branch-2.3 updated: [SPARK-28582][PYTHON] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.3 by this push: new e686178 [SPARK-28582][PYTHON] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7 e686178 is described below commit e686178ffe7ceb33ee42558ee0b4fe28c417124d Author: WeichenXu AuthorDate: Sat Aug 3 10:31:15 2019 +0900 [SPARK-28582][PYTHON] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7 This PR picks up https://github.com/apache/spark/pull/25315 back after removing `Popen.wait` usage which exists in Python 3 only. I saw the last test results wrongly and thought it was passed. Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7. I add a sleep after the test connection to daemon. Run test ``` python/run-tests --python-executables=python3.7 --testname "pyspark.tests.test_daemon DaemonTests" ``` **Before** Fail on test "test_termination_sigterm". And we can see daemon process do not exit. **After** Test passed Closes #25343 from HyukjinKwon/SPARK-28582. Authored-by: WeichenXu Signed-off-by: HyukjinKwon (cherry picked from commit b3394db1930b3c9f55438cb27bb2c584bf041f8e) Signed-off-by: HyukjinKwon --- python/pyspark/tests.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py index 5ba121f..e1c780f 100644 --- a/python/pyspark/tests.py +++ b/python/pyspark/tests.py @@ -1862,6 +1862,9 @@ class DaemonTests(unittest.TestCase): # daemon should accept connections self.assertTrue(self.connect(port)) +# wait worker process spawned from daemon exit. +time.sleep(1) + # request shutdown terminator(daemon) time.sleep(1) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-28582][PYTHON] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 6c61321 [SPARK-28582][PYTHON] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7 6c61321 is described below commit 6c613210cd1075f771376e7ecc6dfb08bd620716 Author: WeichenXu AuthorDate: Sat Aug 3 10:31:15 2019 +0900 [SPARK-28582][PYTHON] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7 This PR picks up https://github.com/apache/spark/pull/25315 back after removing `Popen.wait` usage which exists in Python 3 only. I saw the last test results wrongly and thought it was passed. Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7. I add a sleep after the test connection to daemon. Run test ``` python/run-tests --python-executables=python3.7 --testname "pyspark.tests.test_daemon DaemonTests" ``` **Before** Fail on test "test_termination_sigterm". And we can see daemon process do not exit. **After** Test passed Closes #25343 from HyukjinKwon/SPARK-28582. Authored-by: WeichenXu Signed-off-by: HyukjinKwon (cherry picked from commit b3394db1930b3c9f55438cb27bb2c584bf041f8e) Signed-off-by: HyukjinKwon --- python/pyspark/tests.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py index a2d825b..26c9126 100644 --- a/python/pyspark/tests.py +++ b/python/pyspark/tests.py @@ -1939,6 +1939,9 @@ class DaemonTests(unittest.TestCase): # daemon should accept connections self.assertTrue(self.connect(port)) +# wait worker process spawned from daemon exit. +time.sleep(1) + # request shutdown terminator(daemon) time.sleep(1) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-28582][PYTHON] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b3394db [SPARK-28582][PYTHON] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7 b3394db is described below commit b3394db1930b3c9f55438cb27bb2c584bf041f8e Author: WeichenXu AuthorDate: Sat Aug 3 10:31:15 2019 +0900 [SPARK-28582][PYTHON] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7 ## What changes were proposed in this pull request? This PR picks up https://github.com/apache/spark/pull/25315 back after removing `Popen.wait` usage which exists in Python 3 only. I saw the last test results wrongly and thought it was passed. Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7. I add a sleep after the test connection to daemon. ## How was this patch tested? Run test ``` python/run-tests --python-executables=python3.7 --testname "pyspark.tests.test_daemon DaemonTests" ``` **Before** Fail on test "test_termination_sigterm". And we can see daemon process do not exit. **After** Test passed Closes #25343 from HyukjinKwon/SPARK-28582. Authored-by: WeichenXu Signed-off-by: HyukjinKwon --- python/pyspark/tests/test_daemon.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/python/pyspark/tests/test_daemon.py b/python/pyspark/tests/test_daemon.py index 2cdc16c..898fb39 100644 --- a/python/pyspark/tests/test_daemon.py +++ b/python/pyspark/tests/test_daemon.py @@ -47,6 +47,9 @@ class DaemonTests(unittest.TestCase): # daemon should accept connections self.assertTrue(self.connect(port)) +# wait worker process spawned from daemon exit. +time.sleep(1) + # request shutdown terminator(daemon) time.sleep(1) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-28606][INFRA] Update CRAN key to recover docker image generation
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new be52903 [SPARK-28606][INFRA] Update CRAN key to recover docker image generation be52903 is described below commit be5290387c0c2089863d20f80657a4fb1f7b5e44 Author: Dongjoon Hyun AuthorDate: Fri Aug 2 23:41:00 2019 + [SPARK-28606][INFRA] Update CRAN key to recover docker image generation CRAN repo changed the key and it causes our release script failure. This is a release blocker for Apache Spark 2.4.4 and 3.0.0. - https://cran.r-project.org/bin/linux/ubuntu/README.html ``` Err:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 51716619E084DAB9 ... W: GPG error: https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 51716619E084DAB9 E: The repository 'https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease' is not signed. ``` Note that they are reusing `cran35` for R 3.6 although they changed the key. ``` Even though R has moved to version 3.6, for compatibility the sources.list entry still uses the cran3.5 designation. ``` This PR aims to recover the docker image generation first. We will verify the R doc generation in a separate JIRA and PR. Manual. After `docker-build.log`, it should continue to the next stage, `Building v3.0.0-rc1`. ``` $ dev/create-release/do-release-docker.sh -d /tmp/spark-3.0.0 -n -s docs ... Log file: docker-build.log Building v3.0.0-rc1; output will be at /tmp/spark-3.0.0/output ``` Closes #25339 from dongjoon-hyun/SPARK-28606. Authored-by: Dongjoon Hyun Signed-off-by: DB Tsai (cherry picked from commit 0c6874fb37f97c36a5265455066de9e516845df2) Signed-off-by: Dongjoon Hyun --- dev/create-release/spark-rm/Dockerfile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/create-release/spark-rm/Dockerfile b/dev/create-release/spark-rm/Dockerfile index f78d01f..929997b 100644 --- a/dev/create-release/spark-rm/Dockerfile +++ b/dev/create-release/spark-rm/Dockerfile @@ -40,8 +40,8 @@ ARG PIP_PKGS="pyopenssl pypandoc numpy pygments sphinx" # the most current package versions (instead of potentially using old versions cached by docker). RUN apt-get clean && apt-get update && $APT_INSTALL gnupg ca-certificates apt-transport-https && \ echo 'deb https://cloud.r-project.org/bin/linux/ubuntu xenial/' >> /etc/apt/sources.list && \ - gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9 && \ - gpg -a --export E084DAB9 | apt-key add - && \ + gpg --keyserver keyserver.ubuntu.com --recv-key E298A3A825C0D65DFD57CBB651716619E084DAB9 && \ + gpg -a --export E298A3A825C0D65DFD57CBB651716619E084DAB9 | apt-key add - && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* && \ apt-get clean && \ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-28606][INFRA] Update CRAN key to recover docker image generation
This is an automated email from the ASF dual-hosted git repository. dbtsai pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0c6874f [SPARK-28606][INFRA] Update CRAN key to recover docker image generation 0c6874f is described below commit 0c6874fb37f97c36a5265455066de9e516845df2 Author: Dongjoon Hyun AuthorDate: Fri Aug 2 23:41:00 2019 + [SPARK-28606][INFRA] Update CRAN key to recover docker image generation ## What changes were proposed in this pull request? CRAN repo changed the key and it causes our release script failure. This is a release blocker for Apache Spark 2.4.4 and 3.0.0. - https://cran.r-project.org/bin/linux/ubuntu/README.html ``` Err:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 51716619E084DAB9 ... W: GPG error: https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 51716619E084DAB9 E: The repository 'https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease' is not signed. ``` Note that they are reusing `cran35` for R 3.6 although they changed the key. ``` Even though R has moved to version 3.6, for compatibility the sources.list entry still uses the cran3.5 designation. ``` This PR aims to recover the docker image generation first. We will verify the R doc generation in a separate JIRA and PR. ## How was this patch tested? Manual. After `docker-build.log`, it should continue to the next stage, `Building v3.0.0-rc1`. ``` $ dev/create-release/do-release-docker.sh -d /tmp/spark-3.0.0 -n -s docs ... Log file: docker-build.log Building v3.0.0-rc1; output will be at /tmp/spark-3.0.0/output ``` Closes #25339 from dongjoon-hyun/SPARK-28606. Authored-by: Dongjoon Hyun Signed-off-by: DB Tsai --- dev/create-release/spark-rm/Dockerfile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/create-release/spark-rm/Dockerfile b/dev/create-release/spark-rm/Dockerfile index 45c662d..a1eb598 100644 --- a/dev/create-release/spark-rm/Dockerfile +++ b/dev/create-release/spark-rm/Dockerfile @@ -21,7 +21,7 @@ # * Java 8 # * Ivy # * Python/PyPandoc (2.7.15/3.6.7) -# * R-base/R-base-dev (3.5.0+) +# * R-base/R-base-dev (3.6.1) # * Ruby 2.3 build utilities FROM ubuntu:18.04 @@ -44,7 +44,7 @@ ARG PIP_PKGS="pyopenssl pypandoc numpy pygments sphinx" # the most current package versions (instead of potentially using old versions cached by docker). RUN apt-get clean && apt-get update && $APT_INSTALL gnupg ca-certificates && \ echo 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/' >> /etc/apt/sources.list && \ - gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9 && \ + gpg --keyserver keyserver.ubuntu.com --recv-key E298A3A825C0D65DFD57CBB651716619E084DAB9 && \ gpg -a --export E084DAB9 | apt-key add - && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* && \ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-28574][CORE] Allow to config different sizes for event queues
This is an automated email from the ASF dual-hosted git repository. zsxwing pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c212c9d [SPARK-28574][CORE] Allow to config different sizes for event queues c212c9d is described below commit c212c9d9ed7375cd1ea16c118733edd84037ec0d Author: yunzoud AuthorDate: Fri Aug 2 15:27:33 2019 -0700 [SPARK-28574][CORE] Allow to config different sizes for event queues ## What changes were proposed in this pull request? Add configuration spark.scheduler.listenerbus.eventqueue.${name}.capacity to allow configuration of different event queue size. ## How was this patch tested? Unit test in core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala Closes #25307 from yunzoud/SPARK-28574. Authored-by: yunzoud Signed-off-by: Shixiong Zhu --- .../apache/spark/scheduler/AsyncEventQueue.scala | 14 +-- .../apache/spark/scheduler/LiveListenerBus.scala | 4 .../spark/scheduler/SparkListenerSuite.scala | 28 ++ 3 files changed, 44 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala b/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala index 7cd2b86..11e2c47 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala @@ -46,8 +46,18 @@ private class AsyncEventQueue( // Cap the capacity of the queue so we get an explicit error (rather than an OOM exception) if // it's perpetually being added to more quickly than it's being drained. - private val eventQueue = new LinkedBlockingQueue[SparkListenerEvent]( -conf.get(LISTENER_BUS_EVENT_QUEUE_CAPACITY)) + // The capacity can be configured by spark.scheduler.listenerbus.eventqueue.${name}.capacity, + // if no such conf is specified, use the value specified in + // LISTENER_BUS_EVENT_QUEUE_CAPACITY + private[scheduler] def capacity: Int = { +val queuesize = conf.getInt(s"spark.scheduler.listenerbus.eventqueue.${name}.capacity", +conf.get(LISTENER_BUS_EVENT_QUEUE_CAPACITY)) +assert(queuesize > 0, s"capacity for event queue $name must be greater than 0, " + + s"but $queuesize is configured.") +queuesize + } + + private val eventQueue = new LinkedBlockingQueue[SparkListenerEvent](capacity) // Keep the event count separately, so that waitUntilEmpty() can be implemented properly; // this allows that method to return only when the events in the queue have been fully diff --git a/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala b/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala index d135190..302ebd3 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala @@ -236,6 +236,10 @@ private[spark] class LiveListenerBus(conf: SparkConf) { queues.asScala.map(_.name).toSet } + // For testing only. + private[scheduler] def getQueueCapacity(name: String): Option[Int] = { +queues.asScala.find(_.name == name).map(_.capacity) + } } private[spark] object LiveListenerBus { diff --git a/core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala index a7869d3..8903e10 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala @@ -532,6 +532,34 @@ class SparkListenerSuite extends SparkFunSuite with LocalSparkContext with Match } } + test("event queue size can be configued through spark conf") { +// configure the shared queue size to be 1, event log queue size to be 2, +// and listner bus event queue size to be 5 +val conf = new SparkConf(false) + .set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5) + .set(s"spark.scheduler.listenerbus.eventqueue.${SHARED_QUEUE}.capacity", "1") + .set(s"spark.scheduler.listenerbus.eventqueue.${EVENT_LOG_QUEUE}.capacity", "2") + +val bus = new LiveListenerBus(conf) +val counter1 = new BasicJobCounter() +val counter2 = new BasicJobCounter() +val counter3 = new BasicJobCounter() + +// add a new shared, status and event queue +bus.addToSharedQueue(counter1) +bus.addToStatusQueue(counter2) +bus.addToEventLogQueue(counter3) + +assert(bus.activeQueues() === Set(SHARED_QUEUE, APP_STATUS_QUEUE, EVENT_LOG_QUEUE)) +// check the size of shared queue is 1 as configured +assert(bus.getQueueCapacity(SHARED_QUEUE) == Some(1)) +// no specific size of status queue is configured, +// it shoud use the
[spark] branch master updated (8ae032d -> 10d4ffd)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8ae032d Revert "[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7" add 10d4ffd [SPARK-28532][SPARK-28530][SQL][FOLLOWUP] Inline doc for FixedPoint(1) batches "Subquery" and "Join Reorder" No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala | 4 1 file changed, 4 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: Revert "[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7"
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8ae032d Revert "[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7" 8ae032d is described below commit 8ae032d78d63f95c1100bc826ae246e4d4fdd34a Author: Dongjoon Hyun AuthorDate: Fri Aug 2 10:14:20 2019 -0700 Revert "[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7" This reverts commit fbeee0c5bcea32346b2279c5b67044f12e5faf7d. --- python/pyspark/tests/test_daemon.py | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/python/pyspark/tests/test_daemon.py b/python/pyspark/tests/test_daemon.py index c987fae..2cdc16c 100644 --- a/python/pyspark/tests/test_daemon.py +++ b/python/pyspark/tests/test_daemon.py @@ -47,12 +47,9 @@ class DaemonTests(unittest.TestCase): # daemon should accept connections self.assertTrue(self.connect(port)) -# wait worker process spawned from daemon exit. -time.sleep(1) - # request shutdown terminator(daemon) -daemon.wait(5) +time.sleep(1) # daemon should no longer accept connections try: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: Revert "[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7"
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new fe0f53a Revert "[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7" fe0f53a is described below commit fe0f53ae4e4a71ebb821506af7e7f0ba7690dc53 Author: Dongjoon Hyun AuthorDate: Fri Aug 2 10:08:29 2019 -0700 Revert "[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7" This reverts commit 20e46ef6e3e49e754062717c2cb249c6eb99e86a. --- python/pyspark/tests.py | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py index fc0ed41..a2d825b 100644 --- a/python/pyspark/tests.py +++ b/python/pyspark/tests.py @@ -1939,12 +1939,9 @@ class DaemonTests(unittest.TestCase): # daemon should accept connections self.assertTrue(self.connect(port)) -# wait worker process spawned from daemon exit. -time.sleep(1) - # request shutdown terminator(daemon) -daemon.wait(5) +time.sleep(1) # daemon should no longer accept connections try: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.3 updated: Revert "[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7"
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.3 by this push: new d3a73df Revert "[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7" d3a73df is described below commit d3a73dfcb6fd41c74a71cd21e51a902c2c443148 Author: Dongjoon Hyun AuthorDate: Fri Aug 2 10:07:13 2019 -0700 Revert "[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7" This reverts commit 21ef0fdf7244a169ac0c3e701cb21c35f1038a5d. --- python/pyspark/tests.py | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py index 5fc6887..5ba121f 100644 --- a/python/pyspark/tests.py +++ b/python/pyspark/tests.py @@ -1862,12 +1862,9 @@ class DaemonTests(unittest.TestCase): # daemon should accept connections self.assertTrue(self.connect(port)) -# wait worker process spawned from daemon exit. -time.sleep(1) - # request shutdown terminator(daemon) -daemon.wait(5) +time.sleep(1) # daemon should no longer accept connections try: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [MINOR][DOC][SS] Correct description of minPartitions in Kafka option
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new dad1cd6 [MINOR][DOC][SS] Correct description of minPartitions in Kafka option dad1cd6 is described below commit dad1cd691fcb846f99818cab078808e07f560f6e Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Fri Aug 2 09:12:54 2019 -0700 [MINOR][DOC][SS] Correct description of minPartitions in Kafka option ## What changes were proposed in this pull request? `minPartitions` has been used as a hint and relevant method (KafkaOffsetRangeCalculator.getRanges) doesn't guarantee the behavior that partitions will be equal or more than given value. https://github.com/apache/spark/blob/d67b98ea016e9b714bef68feaac108edd08159c9/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala#L32-L46 This patch makes clear the configuration is a hint, and actual partitions could be less or more. ## How was this patch tested? Just a documentation change. Closes #25332 from HeartSaVioR/MINOR-correct-kafka-structured-streaming-doc-minpartition. Authored-by: Jungtaek Lim (HeartSaVioR) Signed-off-by: Dongjoon Hyun (cherry picked from commit 7ffc00ccc37fc94a45b7241bb3c6a17736b55ba3) Signed-off-by: Dongjoon Hyun --- docs/structured-streaming-kafka-integration.md | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/structured-streaming-kafka-integration.md b/docs/structured-streaming-kafka-integration.md index 680fe78..1a3ee85 100644 --- a/docs/structured-streaming-kafka-integration.md +++ b/docs/structured-streaming-kafka-integration.md @@ -379,10 +379,12 @@ The following configurations are optional: int none streaming and batch - Minimum number of partitions to read from Kafka. + Desired minimum number of partitions to read from Kafka. By default, Spark has a 1-1 mapping of topicPartitions to Spark partitions consuming from Kafka. If you set this option to a value greater than your topicPartitions, Spark will divvy up large - Kafka partitions to smaller pieces. + Kafka partitions to smaller pieces. Please note that this configuration is like a `hint`: the + number of Spark tasks will be **approximately** `minPartitions`. It can be less or more depending on + rounding errors or Kafka partitions that didn't receive any new data. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b148bd5 -> 7ffc00c)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b148bd5 [SPARK-28519][SQL] Use StrictMath log, pow functions for platform independence add 7ffc00c [MINOR][DOC][SS] Correct description of minPartitions in Kafka option No new revisions were added by this update. Summary of changes: docs/structured-streaming-kafka-integration.md | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6d7a675 -> b148bd5)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6d7a675 [SPARK-20604][ML] Allow imputer to handle numeric types add b148bd5 [SPARK-28519][SQL] Use StrictMath log, pow functions for platform independence No new revisions were added by this update. Summary of changes: docs/sql-migration-guide-upgrade.md| 2 + .../sql/catalyst/expressions/mathExpressions.scala | 64 +++--- .../expressions/MathExpressionsSuite.scala | 31 ++- .../sql-tests/results/pgSQL/float8.sql.out | 2 +- .../sql-tests/results/pgSQL/numeric.sql.out| 8 +-- .../org/apache/spark/sql/MathFunctionsSuite.scala | 18 +++--- 6 files changed, 64 insertions(+), 61 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (660423d -> 6d7a675)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 660423d [SPARK-23469][ML] HashingTF should use corrected MurmurHash3 implementation add 6d7a675 [SPARK-20604][ML] Allow imputer to handle numeric types No new revisions were added by this update. Summary of changes: .../org/apache/spark/ml/feature/Imputer.scala | 10 ++-- .../org/apache/spark/ml/feature/ImputerSuite.scala | 57 ++ 2 files changed, 64 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-23469][ML] HashingTF should use corrected MurmurHash3 implementation
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 660423d [SPARK-23469][ML] HashingTF should use corrected MurmurHash3 implementation 660423d is described below commit 660423d71769a42176f7cbe867eb23ec8afe7592 Author: Huaxin Gao AuthorDate: Fri Aug 2 10:53:36 2019 -0500 [SPARK-23469][ML] HashingTF should use corrected MurmurHash3 implementation ## What changes were proposed in this pull request? Update HashingTF to use new implementation of MurmurHash3 Make HashingTF use the old MurmurHash3 when a model from pre 3.0 is loaded ## How was this patch tested? Change existing unit tests. Also add one unit test to make sure HashingTF use the old MurmurHash3 when a model from pre 3.0 is loaded Closes #25303 from huaxingao/spark-23469. Authored-by: Huaxin Gao Signed-off-by: Sean Owen --- .../org/apache/spark/ml/feature/HashingTF.scala| 28 +++-- .../hashingTF-pre3.0/metadata/.part-0.crc | Bin 0 -> 12 bytes .../test-data/hashingTF-pre3.0/metadata/_SUCCESS | 0 .../test-data/hashingTF-pre3.0/metadata/part-0 | 1 + .../apache/spark/ml/feature/HashingTFSuite.scala | 25 +- python/pyspark/ml/feature.py | 8 +++--- python/pyspark/ml/tests/test_feature.py| 2 +- 7 files changed, 51 insertions(+), 13 deletions(-) diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala index 27b8bdc..0e6c43a 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala @@ -26,11 +26,12 @@ import org.apache.spark.ml.linalg.Vectors import org.apache.spark.ml.param._ import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol} import org.apache.spark.ml.util._ -import org.apache.spark.mllib.feature.HashingTF.murmur3Hash +import org.apache.spark.mllib.feature.{HashingTF => OldHashingTF} import org.apache.spark.sql.{DataFrame, Dataset} import org.apache.spark.sql.functions.{col, udf} import org.apache.spark.sql.types.{ArrayType, StructType} import org.apache.spark.util.Utils +import org.apache.spark.util.VersionUtils.majorMinorVersion /** * Maps a sequence of terms to their term frequencies using the hashing trick. @@ -44,7 +45,7 @@ import org.apache.spark.util.Utils class HashingTF @Since("1.4.0") (@Since("1.4.0") override val uid: String) extends Transformer with HasInputCol with HasOutputCol with DefaultParamsWritable { - private[this] val hashFunc: Any => Int = murmur3Hash + private var hashFunc: Any => Int = FeatureHasher.murmur3Hash @Since("1.2.0") def this() = this(Identifiable.randomUID("hashingTF")) @@ -142,6 +143,29 @@ class HashingTF @Since("1.4.0") (@Since("1.4.0") override val uid: String) @Since("1.6.0") object HashingTF extends DefaultParamsReadable[HashingTF] { + private class HashingTFReader extends MLReader[HashingTF] { + +private val className = classOf[HashingTF].getName + +override def load(path: String): HashingTF = { + val metadata = DefaultParamsReader.loadMetadata(path, sc, className) + val hashingTF = new HashingTF(metadata.uid) + metadata.getAndSetParams(hashingTF) + + // We support loading old `HashingTF` saved by previous Spark versions. + // Previous `HashingTF` uses `mllib.feature.HashingTF.murmur3Hash`, but new `HashingTF` uses + // `ml.Feature.FeatureHasher.murmur3Hash`. + val (majorVersion, _) = majorMinorVersion(metadata.sparkVersion) + if (majorVersion < 3) { +hashingTF.hashFunc = OldHashingTF.murmur3Hash + } + hashingTF +} + } + + @Since("3.0.0") + override def read: MLReader[HashingTF] = new HashingTFReader + @Since("1.6.0") override def load(path: String): HashingTF = super.load(path) } diff --git a/mllib/src/test/resources/test-data/hashingTF-pre3.0/metadata/.part-0.crc b/mllib/src/test/resources/test-data/hashingTF-pre3.0/metadata/.part-0.crc new file mode 100644 index 000..1ac377a Binary files /dev/null and b/mllib/src/test/resources/test-data/hashingTF-pre3.0/metadata/.part-0.crc differ diff --git a/mllib/src/test/resources/test-data/hashingTF-pre3.0/metadata/_SUCCESS b/mllib/src/test/resources/test-data/hashingTF-pre3.0/metadata/_SUCCESS new file mode 100644 index 000..e69de29 diff --git a/mllib/src/test/resources/test-data/hashingTF-pre3.0/metadata/part-0 b/mllib/src/test/resources/test-data/hashingTF-pre3.0/metadata/part-0 new file mode 100644 index 000..492a07a --- /dev/null +++ b/mllib/src/test/resources/test-data/hashingTF-pre3.0/metadata/part-0 @@ -0,0 +1 @@
[spark] branch master updated: [SPARK-28510][SQL] Implement Spark's own GetFunctionsOperation
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new efd9299 [SPARK-28510][SQL] Implement Spark's own GetFunctionsOperation efd9299 is described below commit efd92993f403fe40b3abd1dac21f8d7c875f407d Author: Yuming Wang AuthorDate: Fri Aug 2 08:50:42 2019 -0700 [SPARK-28510][SQL] Implement Spark's own GetFunctionsOperation ## What changes were proposed in this pull request? This PR implements Spark's own GetFunctionsOperation which mitigates the differences between Spark SQL and Hive UDFs. But our implementation is different from Hive's implementation: - Our implementation always returns results. Hive only returns results when [(null == catalogName || "".equals(catalogName)) && (null == schemaName || "".equals(schemaName))](https://github.com/apache/hive/blob/rel/release-3.1.1/service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java#L101-L119). - Our implementation pads the `REMARKS` field with the function usage - Hive returns an empty string. - Our implementation does not support `FUNCTION_TYPE`, but Hive does. ## How was this patch tested? unit tests Closes #25252 from wangyum/SPARK-28510. Authored-by: Yuming Wang Signed-off-by: gatorsmile --- .../thriftserver/SparkGetFunctionsOperation.scala | 115 + .../server/SparkSQLOperationManager.scala | 15 +++ .../thriftserver/SparkMetadataOperationSuite.scala | 43 +++- .../cli/operation/GetFunctionsOperation.java | 2 +- .../cli/operation/GetFunctionsOperation.java | 2 +- 5 files changed, 174 insertions(+), 3 deletions(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala new file mode 100644 index 000..462e5730 --- /dev/null +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.thriftserver + +import java.sql.DatabaseMetaData +import java.util.UUID + +import scala.collection.JavaConverters.seqAsJavaListConverter + +import org.apache.hadoop.hive.ql.security.authorization.plugin.{HiveOperationType, HivePrivilegeObjectUtils} +import org.apache.hive.service.cli._ +import org.apache.hive.service.cli.operation.GetFunctionsOperation +import org.apache.hive.service.cli.operation.MetadataOperation.DEFAULT_HIVE_CATALOG +import org.apache.hive.service.cli.session.HiveSession + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.SQLContext +import org.apache.spark.util.{Utils => SparkUtils} + +/** + * Spark's own GetFunctionsOperation + * + * @param sqlContext SQLContext to use + * @param parentSession a HiveSession from SessionManager + * @param catalogName catalog name. null if not applicable + * @param schemaName database name, null or a concrete database name + * @param functionName function name pattern + */ +private[hive] class SparkGetFunctionsOperation( +sqlContext: SQLContext, +parentSession: HiveSession, +catalogName: String, +schemaName: String, +functionName: String) + extends GetFunctionsOperation(parentSession, catalogName, schemaName, functionName) with Logging { + + private var statementId: String = _ + + override def close(): Unit = { +super.close() +HiveThriftServer2.listener.onOperationClosed(statementId) + } + + override def runInternal(): Unit = { +statementId = UUID.randomUUID().toString +// Do not change cmdStr. It's used for Hive auditing and authorization. +val cmdStr = s"catalog : $catalogName, schemaPattern : $schemaName" +val logMsg = s"Listing functions '$cmdStr, functionName : $functionName'" +logInfo(s"$logMsg with $statementId") +setState(OperationState.RUNNING) +// Always use the latest class loader
[spark] branch master updated (fbeee0c -> 37243e1)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fbeee0c [SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7 add 37243e1 [SPARK-28579][ML] MaxAbsScaler avoids conversion to breeze.vector No new revisions were added by this update. Summary of changes: .../org/apache/spark/ml/feature/MaxAbsScaler.scala | 21 ++--- 1 file changed, 10 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.3 updated: [SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.3 by this push: new 21ef0fd [SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7 21ef0fd is described below commit 21ef0fdf7244a169ac0c3e701cb21c35f1038a5d Author: WeichenXu AuthorDate: Fri Aug 2 22:07:06 2019 +0900 [SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7 Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7. I add a sleep after the test connection to daemon. Run test ``` python/run-tests --python-executables=python3.7 --testname "pyspark.tests.test_daemon DaemonTests" ``` **Before** Fail on test "test_termination_sigterm". And we can see daemon process do not exit. **After** Test passed Closes #25315 from WeichenXu123/fix_py37_daemon. Authored-by: WeichenXu Signed-off-by: HyukjinKwon (cherry picked from commit fbeee0c5bcea32346b2279c5b67044f12e5faf7d) Signed-off-by: HyukjinKwon --- python/pyspark/tests.py | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py index 5ba121f..5fc6887 100644 --- a/python/pyspark/tests.py +++ b/python/pyspark/tests.py @@ -1862,9 +1862,12 @@ class DaemonTests(unittest.TestCase): # daemon should accept connections self.assertTrue(self.connect(port)) +# wait worker process spawned from daemon exit. +time.sleep(1) + # request shutdown terminator(daemon) -time.sleep(1) +daemon.wait(5) # daemon should no longer accept connections try: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 20e46ef [SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7 20e46ef is described below commit 20e46ef6e3e49e754062717c2cb249c6eb99e86a Author: WeichenXu AuthorDate: Fri Aug 2 22:07:06 2019 +0900 [SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7 Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7. I add a sleep after the test connection to daemon. Run test ``` python/run-tests --python-executables=python3.7 --testname "pyspark.tests.test_daemon DaemonTests" ``` **Before** Fail on test "test_termination_sigterm". And we can see daemon process do not exit. **After** Test passed Closes #25315 from WeichenXu123/fix_py37_daemon. Authored-by: WeichenXu Signed-off-by: HyukjinKwon (cherry picked from commit fbeee0c5bcea32346b2279c5b67044f12e5faf7d) Signed-off-by: HyukjinKwon --- python/pyspark/tests.py | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py index a2d825b..fc0ed41 100644 --- a/python/pyspark/tests.py +++ b/python/pyspark/tests.py @@ -1939,9 +1939,12 @@ class DaemonTests(unittest.TestCase): # daemon should accept connections self.assertTrue(self.connect(port)) +# wait worker process spawned from daemon exit. +time.sleep(1) + # request shutdown terminator(daemon) -time.sleep(1) +daemon.wait(5) # daemon should no longer accept connections try: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: Revert "[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7"
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new a065a50 Revert "[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7" a065a50 is described below commit a065a503bcf1ec5b7d49c575af7bc6867c734d90 Author: HyukjinKwon AuthorDate: Fri Aug 2 22:12:04 2019 +0900 Revert "[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7" This reverts commit dc09a02c142d3787e728c8b25eb8417649d98e9f. --- python/pyspark/tests.py | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py index fef2959..a2d825b 100644 --- a/python/pyspark/tests.py +++ b/python/pyspark/tests.py @@ -1916,7 +1916,7 @@ class OutputFormatTests(ReusedPySparkTestCase): class DaemonTests(unittest.TestCase): def connect(self, port): -from socket import socket, AF_INET, SOCK_STREAM# request shutdown +from socket import socket, AF_INET, SOCK_STREAM sock = socket(AF_INET, SOCK_STREAM) sock.connect(('127.0.0.1', port)) # send a split index of -1 to shutdown the worker @@ -1939,12 +1939,9 @@ class DaemonTests(unittest.TestCase): # daemon should accept connections self.assertTrue(self.connect(port)) -# wait worker process spawned from daemon exit. -time.sleep(1) - # request shutdown terminator(daemon) -daemon.wait(5) +time.sleep(1) # daemon should no longer accept connections try: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new dc09a02 [SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7 dc09a02 is described below commit dc09a02c142d3787e728c8b25eb8417649d98e9f Author: WeichenXu AuthorDate: Fri Aug 2 22:07:06 2019 +0900 [SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7 Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7. I add a sleep after the test connection to daemon. Run test ``` python/run-tests --python-executables=python3.7 --testname "pyspark.tests.test_daemon DaemonTests" ``` **Before** Fail on test "test_termination_sigterm". And we can see daemon process do not exit. **After** Test passed Closes #25315 from WeichenXu123/fix_py37_daemon. Authored-by: WeichenXu Signed-off-by: HyukjinKwon (cherry picked from commit fbeee0c5bcea32346b2279c5b67044f12e5faf7d) Signed-off-by: HyukjinKwon --- python/pyspark/tests.py | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py index a2d825b..fef2959 100644 --- a/python/pyspark/tests.py +++ b/python/pyspark/tests.py @@ -1916,7 +1916,7 @@ class OutputFormatTests(ReusedPySparkTestCase): class DaemonTests(unittest.TestCase): def connect(self, port): -from socket import socket, AF_INET, SOCK_STREAM +from socket import socket, AF_INET, SOCK_STREAM# request shutdown sock = socket(AF_INET, SOCK_STREAM) sock.connect(('127.0.0.1', port)) # send a split index of -1 to shutdown the worker @@ -1939,9 +1939,12 @@ class DaemonTests(unittest.TestCase): # daemon should accept connections self.assertTrue(self.connect(port)) +# wait worker process spawned from daemon exit. +time.sleep(1) + # request shutdown terminator(daemon) -time.sleep(1) +daemon.wait(5) # daemon should no longer accept connections try: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new fbeee0c [SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7 fbeee0c is described below commit fbeee0c5bcea32346b2279c5b67044f12e5faf7d Author: WeichenXu AuthorDate: Fri Aug 2 22:07:06 2019 +0900 [SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7 ## What changes were proposed in this pull request? Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7. I add a sleep after the test connection to daemon. ## How was this patch tested? Run test ``` python/run-tests --python-executables=python3.7 --testname "pyspark.tests.test_daemon DaemonTests" ``` **Before** Fail on test "test_termination_sigterm". And we can see daemon process do not exit. **After** Test passed Closes #25315 from WeichenXu123/fix_py37_daemon. Authored-by: WeichenXu Signed-off-by: HyukjinKwon --- python/pyspark/tests/test_daemon.py | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/python/pyspark/tests/test_daemon.py b/python/pyspark/tests/test_daemon.py index 2cdc16c..c987fae 100644 --- a/python/pyspark/tests/test_daemon.py +++ b/python/pyspark/tests/test_daemon.py @@ -47,9 +47,12 @@ class DaemonTests(unittest.TestCase): # daemon should accept connections self.assertTrue(self.connect(port)) +# wait worker process spawned from daemon exit. +time.sleep(1) + # request shutdown terminator(daemon) -time.sleep(1) +daemon.wait(5) # daemon should no longer accept connections try: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-28445][SQL][PYTHON] Fix error when PythonUDF is used in both group by and aggregate expression
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 77c7e91 [SPARK-28445][SQL][PYTHON] Fix error when PythonUDF is used in both group by and aggregate expression 77c7e91 is described below commit 77c7e91e029a9a70678435acb141154f2f51882e Author: Liang-Chi Hsieh AuthorDate: Fri Aug 2 19:47:29 2019 +0900 [SPARK-28445][SQL][PYTHON] Fix error when PythonUDF is used in both group by and aggregate expression ## What changes were proposed in this pull request? When PythonUDF is used in group by, and it is also in aggregate expression, like ``` SELECT pyUDF(a + 1), COUNT(b) FROM testData GROUP BY pyUDF(a + 1) ``` It causes analysis exception in `CheckAnalysis`, like ``` org.apache.spark.sql.AnalysisException: expression 'testdata.`a`' is neither present in the group by, nor is it an aggregate function. ``` First, `CheckAnalysis` can't check semantic equality between PythonUDFs. Second, even we make it possible, runtime exception will be thrown ``` org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: pythonUDF1#8615 ... Cause: java.lang.RuntimeException: Couldn't find pythonUDF1#8615 in [cast(pythonUDF0#8614 as int)#8617,count(b#8599)#8607L] ``` The cause is, `ExtractPythonUDFs` extracts both PythonUDFs in group by and aggregate expression. The PythonUDFs are two different aliases now in the logical aggregate. In runtime, we can't bind the resulting expression in aggregate to its grouping and aggregate attributes. This patch proposes a rule `ExtractGroupingPythonUDFFromAggregate` to extract PythonUDFs in group by and evaluate them before aggregate. We replace the group by PythonUDF in aggregate expression with aliased result. The query plan of query `SELECT pyUDF(a + 1), pyUDF(COUNT(b)) FROM testData GROUP BY pyUDF(a + 1)`, like ``` == Optimized Logical Plan == Project [CAST(pyUDF(cast((a + 1) as string)) AS INT)#8608, cast(pythonUDF0#8616 as bigint) AS CAST(pyUDF(cast(count(b) as string)) AS BIGINT)#8610L] +- BatchEvalPython [pyUDF(cast(agg#8613L as string))], [pythonUDF0#8616] +- Aggregate [cast(groupingPythonUDF#8614 as int)], [cast(groupingPythonUDF#8614 as int) AS CAST(pyUDF(cast((a + 1) as string)) AS INT)#8608, count(b#8599) AS agg#8613L] +- Project [pythonUDF0#8615 AS groupingPythonUDF#8614, b#8599] +- BatchEvalPython [pyUDF(cast((a#8598 + 1) as string))], [pythonUDF0#8615] +- LocalRelation [a#8598, b#8599] == Physical Plan == *(3) Project [CAST(pyUDF(cast((a + 1) as string)) AS INT)#8608, cast(pythonUDF0#8616 as bigint) AS CAST(pyUDF(cast(count(b) as string)) AS BIGINT)#8610L] +- BatchEvalPython [pyUDF(cast(agg#8613L as string))], [pythonUDF0#8616] +- *(2) HashAggregate(keys=[cast(groupingPythonUDF#8614 as int)#8617], functions=[count(b#8599)], output=[CAST(pyUDF(cast((a + 1) as string)) AS INT)#8608, agg#8613L]) +- Exchange hashpartitioning(cast(groupingPythonUDF#8614 as int)#8617, 5), true +- *(1) HashAggregate(keys=[cast(groupingPythonUDF#8614 as int) AS cast(groupingPythonUDF#8614 as int)#8617], functions=[partial_count(b#8599)], output=[cast(groupingPythonUDF#8614 as int)#8617, count#8619L]) +- *(1) Project [pythonUDF0#8615 AS groupingPythonUDF#8614, b#8599] +- BatchEvalPython [pyUDF(cast((a#8598 + 1) as string))], [pythonUDF0#8615] +- LocalTableScan [a#8598, b#8599] ``` ## How was this patch tested? Added tests. Closes #25215 from viirya/SPARK-28445. Authored-by: Liang-Chi Hsieh Signed-off-by: HyukjinKwon --- .../spark/sql/catalyst/expressions/PythonUDF.scala | 6 ++ .../spark/sql/execution/SparkOptimizer.scala | 6 +- .../sql/execution/python/ExtractPythonUDFs.scala | 63 +++ .../sql/execution/python/PythonUDFSuite.scala | 71 ++ 4 files changed, 144 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala index 690969e..da2e182 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala @@ -67,4 +67,10 @@ case class PythonUDF( exprId = resultId) override def nullable: Boolean = true + + override lazy val canonicalized: Expression = { +val canonicalizedChildren = children.map(_.canonicalized) +// `resultId` can be seen as cosmetic variation
[spark] branch master updated: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink
This is an automated email from the ASF dual-hosted git repository. jshao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6d32dee [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink 6d32dee is described below commit 6d32deeecc5a80230158e12982a5c1ea3f70d89d Author: Nick Karpov AuthorDate: Fri Aug 2 17:50:15 2019 +0800 [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink ## What changes were proposed in this pull request? Today all registered metric sources are reported to GraphiteSink with no filtering mechanism, although the codahale project does support it. GraphiteReporter (ScheduledReporter) from the codahale project requires you implement and supply the MetricFilter interface (there is only a single implementation by default in the codahale project, MetricFilter.ALL). Propose to add an additional regex config to match and filter metrics to the GraphiteSink ## How was this patch tested? Included a GraphiteSinkSuite that tests: 1. Absence of regex filter (existing default behavior maintained) 2. Presence of `regex=` correctly filters metric keys Closes #25232 from nkarpov/graphite_regex. Authored-by: Nick Karpov Signed-off-by: jerryshao --- conf/metrics.properties.template | 1 + .../apache/spark/metrics/sink/GraphiteSink.scala | 13 +++- .../spark/metrics/sink/GraphiteSinkSuite.scala | 84 ++ docs/monitoring.md | 1 + 4 files changed, 98 insertions(+), 1 deletion(-) diff --git a/conf/metrics.properties.template b/conf/metrics.properties.template index 23407e1..da0b06d 100644 --- a/conf/metrics.properties.template +++ b/conf/metrics.properties.template @@ -121,6 +121,7 @@ # unit seconds Unit of the poll period # prefixEMPTY STRING Prefix to prepend to every metric's name # protocol tcp Protocol ("tcp" or "udp") to use +# regex NONE Optional filter to send only metrics matching this regex string # org.apache.spark.metrics.sink.StatsdSink # Name: Default: Description: diff --git a/core/src/main/scala/org/apache/spark/metrics/sink/GraphiteSink.scala b/core/src/main/scala/org/apache/spark/metrics/sink/GraphiteSink.scala index 21b4dfb..05d553e 100644 --- a/core/src/main/scala/org/apache/spark/metrics/sink/GraphiteSink.scala +++ b/core/src/main/scala/org/apache/spark/metrics/sink/GraphiteSink.scala @@ -20,7 +20,7 @@ package org.apache.spark.metrics.sink import java.util.{Locale, Properties} import java.util.concurrent.TimeUnit -import com.codahale.metrics.MetricRegistry +import com.codahale.metrics.{Metric, MetricFilter, MetricRegistry} import com.codahale.metrics.graphite.{Graphite, GraphiteReporter, GraphiteUDP} import org.apache.spark.SecurityManager @@ -38,6 +38,7 @@ private[spark] class GraphiteSink(val property: Properties, val registry: Metric val GRAPHITE_KEY_UNIT = "unit" val GRAPHITE_KEY_PREFIX = "prefix" val GRAPHITE_KEY_PROTOCOL = "protocol" + val GRAPHITE_KEY_REGEX = "regex" def propertyToOption(prop: String): Option[String] = Option(property.getProperty(prop)) @@ -72,10 +73,20 @@ private[spark] class GraphiteSink(val property: Properties, val registry: Metric case Some(p) => throw new Exception(s"Invalid Graphite protocol: $p") } + val filter = propertyToOption(GRAPHITE_KEY_REGEX) match { +case Some(pattern) => new MetricFilter() { + override def matches(name: String, metric: Metric): Boolean = { +pattern.r.findFirstMatchIn(name).isDefined + } +} +case None => MetricFilter.ALL + } + val reporter: GraphiteReporter = GraphiteReporter.forRegistry(registry) .convertDurationsTo(TimeUnit.MILLISECONDS) .convertRatesTo(TimeUnit.SECONDS) .prefixedWith(prefix) + .filter(filter) .build(graphite) override def start() { diff --git a/core/src/test/scala/org/apache/spark/metrics/sink/GraphiteSinkSuite.scala b/core/src/test/scala/org/apache/spark/metrics/sink/GraphiteSinkSuite.scala new file mode 100644 index 000..2369218 --- /dev/null +++ b/core/src/test/scala/org/apache/spark/metrics/sink/GraphiteSinkSuite.scala @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License