[spark] branch branch-2.3 updated: [SPARK-28582][PYTHON] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7

2019-08-02 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new e686178  [SPARK-28582][PYTHON] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7
e686178 is described below

commit e686178ffe7ceb33ee42558ee0b4fe28c417124d
Author: WeichenXu 
AuthorDate: Sat Aug 3 10:31:15 2019 +0900

[SPARK-28582][PYTHON] Fix flaky test DaemonTests.do_termination_test which 
fail on Python 3.7

This PR picks up https://github.com/apache/spark/pull/25315 back after 
removing `Popen.wait` usage which exists in Python 3 only. I saw the last test 
results wrongly and thought it was passed.

Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7. I 
add a sleep after the test connection to daemon.

Run test
```
python/run-tests --python-executables=python3.7 --testname 
"pyspark.tests.test_daemon DaemonTests"
```
**Before**
Fail on test "test_termination_sigterm". And we can see daemon process do 
not exit.
**After**
Test passed

Closes #25343 from HyukjinKwon/SPARK-28582.

Authored-by: WeichenXu 
Signed-off-by: HyukjinKwon 
(cherry picked from commit b3394db1930b3c9f55438cb27bb2c584bf041f8e)
Signed-off-by: HyukjinKwon 
---
 python/pyspark/tests.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
index 5ba121f..e1c780f 100644
--- a/python/pyspark/tests.py
+++ b/python/pyspark/tests.py
@@ -1862,6 +1862,9 @@ class DaemonTests(unittest.TestCase):
 # daemon should accept connections
 self.assertTrue(self.connect(port))
 
+# wait worker process spawned from daemon exit.
+time.sleep(1)
+
 # request shutdown
 terminator(daemon)
 time.sleep(1)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-28582][PYTHON] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7

2019-08-02 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 6c61321  [SPARK-28582][PYTHON] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7
6c61321 is described below

commit 6c613210cd1075f771376e7ecc6dfb08bd620716
Author: WeichenXu 
AuthorDate: Sat Aug 3 10:31:15 2019 +0900

[SPARK-28582][PYTHON] Fix flaky test DaemonTests.do_termination_test which 
fail on Python 3.7

This PR picks up https://github.com/apache/spark/pull/25315 back after 
removing `Popen.wait` usage which exists in Python 3 only. I saw the last test 
results wrongly and thought it was passed.

Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7. I 
add a sleep after the test connection to daemon.

Run test
```
python/run-tests --python-executables=python3.7 --testname 
"pyspark.tests.test_daemon DaemonTests"
```
**Before**
Fail on test "test_termination_sigterm". And we can see daemon process do 
not exit.
**After**
Test passed

Closes #25343 from HyukjinKwon/SPARK-28582.

Authored-by: WeichenXu 
Signed-off-by: HyukjinKwon 
(cherry picked from commit b3394db1930b3c9f55438cb27bb2c584bf041f8e)
Signed-off-by: HyukjinKwon 
---
 python/pyspark/tests.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
index a2d825b..26c9126 100644
--- a/python/pyspark/tests.py
+++ b/python/pyspark/tests.py
@@ -1939,6 +1939,9 @@ class DaemonTests(unittest.TestCase):
 # daemon should accept connections
 self.assertTrue(self.connect(port))
 
+# wait worker process spawned from daemon exit.
+time.sleep(1)
+
 # request shutdown
 terminator(daemon)
 time.sleep(1)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-28582][PYTHON] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7

2019-08-02 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b3394db  [SPARK-28582][PYTHON] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7
b3394db is described below

commit b3394db1930b3c9f55438cb27bb2c584bf041f8e
Author: WeichenXu 
AuthorDate: Sat Aug 3 10:31:15 2019 +0900

[SPARK-28582][PYTHON] Fix flaky test DaemonTests.do_termination_test which 
fail on Python 3.7

## What changes were proposed in this pull request?

This PR picks up https://github.com/apache/spark/pull/25315 back after 
removing `Popen.wait` usage which exists in Python 3 only. I saw the last test 
results wrongly and thought it was passed.

Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7. I 
add a sleep after the test connection to daemon.

## How was this patch tested?

Run test
```
python/run-tests --python-executables=python3.7 --testname 
"pyspark.tests.test_daemon DaemonTests"
```
**Before**
Fail on test "test_termination_sigterm". And we can see daemon process do 
not exit.
**After**
Test passed

Closes #25343 from HyukjinKwon/SPARK-28582.

Authored-by: WeichenXu 
Signed-off-by: HyukjinKwon 
---
 python/pyspark/tests/test_daemon.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/python/pyspark/tests/test_daemon.py 
b/python/pyspark/tests/test_daemon.py
index 2cdc16c..898fb39 100644
--- a/python/pyspark/tests/test_daemon.py
+++ b/python/pyspark/tests/test_daemon.py
@@ -47,6 +47,9 @@ class DaemonTests(unittest.TestCase):
 # daemon should accept connections
 self.assertTrue(self.connect(port))
 
+# wait worker process spawned from daemon exit.
+time.sleep(1)
+
 # request shutdown
 terminator(daemon)
 time.sleep(1)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-28606][INFRA] Update CRAN key to recover docker image generation

2019-08-02 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new be52903  [SPARK-28606][INFRA] Update CRAN key to recover docker image 
generation
be52903 is described below

commit be5290387c0c2089863d20f80657a4fb1f7b5e44
Author: Dongjoon Hyun 
AuthorDate: Fri Aug 2 23:41:00 2019 +

[SPARK-28606][INFRA] Update CRAN key to recover docker image generation

CRAN repo changed the key and it causes our release script failure. This is 
a release blocker for Apache Spark 2.4.4 and 3.0.0.
- https://cran.r-project.org/bin/linux/ubuntu/README.html
```
Err:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease
  The following signatures couldn't be verified because the public key is 
not available: NO_PUBKEY 51716619E084DAB9
...
W: GPG error: https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ 
InRelease: The following signatures couldn't be verified because the public key 
is not available: NO_PUBKEY 51716619E084DAB9
E: The repository 'https://cloud.r-project.org/bin/linux/ubuntu 
bionic-cran35/ InRelease' is not signed.
```

Note that they are reusing `cran35` for R 3.6 although they changed the key.
```
Even though R has moved to version 3.6, for compatibility the sources.list 
entry still uses the cran3.5 designation.
```

This PR aims to recover the docker image generation first. We will verify 
the R doc generation in a separate JIRA and PR.

Manual. After `docker-build.log`, it should continue to the next stage, 
`Building v3.0.0-rc1`.
```
$ dev/create-release/do-release-docker.sh -d /tmp/spark-3.0.0 -n -s docs
...
Log file: docker-build.log
Building v3.0.0-rc1; output will be at /tmp/spark-3.0.0/output
```

Closes #25339 from dongjoon-hyun/SPARK-28606.

Authored-by: Dongjoon Hyun 
Signed-off-by: DB Tsai 
(cherry picked from commit 0c6874fb37f97c36a5265455066de9e516845df2)
Signed-off-by: Dongjoon Hyun 
---
 dev/create-release/spark-rm/Dockerfile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/create-release/spark-rm/Dockerfile 
b/dev/create-release/spark-rm/Dockerfile
index f78d01f..929997b 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -40,8 +40,8 @@ ARG PIP_PKGS="pyopenssl pypandoc numpy pygments sphinx"
 # the most current package versions (instead of potentially using old versions 
cached by docker).
 RUN apt-get clean && apt-get update && $APT_INSTALL gnupg ca-certificates 
apt-transport-https && \
   echo 'deb https://cloud.r-project.org/bin/linux/ubuntu xenial/' >> 
/etc/apt/sources.list && \
-  gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9 && \
-  gpg -a --export E084DAB9 | apt-key add - && \
+  gpg --keyserver keyserver.ubuntu.com --recv-key 
E298A3A825C0D65DFD57CBB651716619E084DAB9 && \
+  gpg -a --export E298A3A825C0D65DFD57CBB651716619E084DAB9 | apt-key add - && \
   apt-get clean && \
   rm -rf /var/lib/apt/lists/* && \
   apt-get clean && \


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-28606][INFRA] Update CRAN key to recover docker image generation

2019-08-02 Thread dbtsai
This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0c6874f  [SPARK-28606][INFRA] Update CRAN key to recover docker image 
generation
0c6874f is described below

commit 0c6874fb37f97c36a5265455066de9e516845df2
Author: Dongjoon Hyun 
AuthorDate: Fri Aug 2 23:41:00 2019 +

[SPARK-28606][INFRA] Update CRAN key to recover docker image generation

## What changes were proposed in this pull request?

CRAN repo changed the key and it causes our release script failure. This is 
a release blocker for Apache Spark 2.4.4 and 3.0.0.
- https://cran.r-project.org/bin/linux/ubuntu/README.html
```
Err:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease
  The following signatures couldn't be verified because the public key is 
not available: NO_PUBKEY 51716619E084DAB9
...
W: GPG error: https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ 
InRelease: The following signatures couldn't be verified because the public key 
is not available: NO_PUBKEY 51716619E084DAB9
E: The repository 'https://cloud.r-project.org/bin/linux/ubuntu 
bionic-cran35/ InRelease' is not signed.
```

Note that they are reusing `cran35` for R 3.6 although they changed the key.
```
Even though R has moved to version 3.6, for compatibility the sources.list 
entry still uses the cran3.5 designation.
```

This PR aims to recover the docker image generation first. We will verify 
the R doc generation in a separate JIRA and PR.

## How was this patch tested?

Manual. After `docker-build.log`, it should continue to the next stage, 
`Building v3.0.0-rc1`.
```
$ dev/create-release/do-release-docker.sh -d /tmp/spark-3.0.0 -n -s docs
...
Log file: docker-build.log
Building v3.0.0-rc1; output will be at /tmp/spark-3.0.0/output
```

Closes #25339 from dongjoon-hyun/SPARK-28606.

Authored-by: Dongjoon Hyun 
Signed-off-by: DB Tsai 
---
 dev/create-release/spark-rm/Dockerfile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/create-release/spark-rm/Dockerfile 
b/dev/create-release/spark-rm/Dockerfile
index 45c662d..a1eb598 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -21,7 +21,7 @@
 # * Java 8
 # * Ivy
 # * Python/PyPandoc (2.7.15/3.6.7)
-# * R-base/R-base-dev (3.5.0+)
+# * R-base/R-base-dev (3.6.1)
 # * Ruby 2.3 build utilities
 
 FROM ubuntu:18.04
@@ -44,7 +44,7 @@ ARG PIP_PKGS="pyopenssl pypandoc numpy pygments sphinx"
 # the most current package versions (instead of potentially using old versions 
cached by docker).
 RUN apt-get clean && apt-get update && $APT_INSTALL gnupg ca-certificates && \
   echo 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/' >> 
/etc/apt/sources.list && \
-  gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9 && \
+  gpg --keyserver keyserver.ubuntu.com --recv-key 
E298A3A825C0D65DFD57CBB651716619E084DAB9 && \
   gpg -a --export E084DAB9 | apt-key add - && \
   apt-get clean && \
   rm -rf /var/lib/apt/lists/* && \


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-28574][CORE] Allow to config different sizes for event queues

2019-08-02 Thread zsxwing
This is an automated email from the ASF dual-hosted git repository.

zsxwing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c212c9d  [SPARK-28574][CORE] Allow to config different sizes for event 
queues
c212c9d is described below

commit c212c9d9ed7375cd1ea16c118733edd84037ec0d
Author: yunzoud 
AuthorDate: Fri Aug 2 15:27:33 2019 -0700

[SPARK-28574][CORE] Allow to config different sizes for event queues

## What changes were proposed in this pull request?
Add configuration spark.scheduler.listenerbus.eventqueue.${name}.capacity 
to allow configuration of different event queue size.

## How was this patch tested?
Unit test in 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala

Closes #25307 from yunzoud/SPARK-28574.

Authored-by: yunzoud 
Signed-off-by: Shixiong Zhu 
---
 .../apache/spark/scheduler/AsyncEventQueue.scala   | 14 +--
 .../apache/spark/scheduler/LiveListenerBus.scala   |  4 
 .../spark/scheduler/SparkListenerSuite.scala   | 28 ++
 3 files changed, 44 insertions(+), 2 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala 
b/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala
index 7cd2b86..11e2c47 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala
@@ -46,8 +46,18 @@ private class AsyncEventQueue(
 
   // Cap the capacity of the queue so we get an explicit error (rather than an 
OOM exception) if
   // it's perpetually being added to more quickly than it's being drained.
-  private val eventQueue = new LinkedBlockingQueue[SparkListenerEvent](
-conf.get(LISTENER_BUS_EVENT_QUEUE_CAPACITY))
+  // The capacity can be configured by 
spark.scheduler.listenerbus.eventqueue.${name}.capacity,
+  // if no such conf is specified, use the value specified in
+  // LISTENER_BUS_EVENT_QUEUE_CAPACITY
+  private[scheduler] def capacity: Int = {
+val queuesize = 
conf.getInt(s"spark.scheduler.listenerbus.eventqueue.${name}.capacity",
+conf.get(LISTENER_BUS_EVENT_QUEUE_CAPACITY))
+assert(queuesize > 0, s"capacity for event queue $name must be greater 
than 0, " +
+  s"but $queuesize is configured.")
+queuesize
+  }
+
+  private val eventQueue = new 
LinkedBlockingQueue[SparkListenerEvent](capacity)
 
   // Keep the event count separately, so that waitUntilEmpty() can be 
implemented properly;
   // this allows that method to return only when the events in the queue have 
been fully
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala 
b/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala
index d135190..302ebd3 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala
@@ -236,6 +236,10 @@ private[spark] class LiveListenerBus(conf: SparkConf) {
 queues.asScala.map(_.name).toSet
   }
 
+  // For testing only.
+  private[scheduler] def getQueueCapacity(name: String): Option[Int] = {
+queues.asScala.find(_.name == name).map(_.capacity)
+  }
 }
 
 private[spark] object LiveListenerBus {
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
index a7869d3..8903e10 100644
--- a/core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
@@ -532,6 +532,34 @@ class SparkListenerSuite extends SparkFunSuite with 
LocalSparkContext with Match
 }
   }
 
+  test("event queue size can be configued through spark conf") {
+// configure the shared queue size to be 1, event log queue size to be 2,
+// and listner bus event queue size to be 5
+val conf = new SparkConf(false)
+  .set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5)
+  .set(s"spark.scheduler.listenerbus.eventqueue.${SHARED_QUEUE}.capacity", 
"1")
+  
.set(s"spark.scheduler.listenerbus.eventqueue.${EVENT_LOG_QUEUE}.capacity", "2")
+
+val bus = new LiveListenerBus(conf)
+val counter1 = new BasicJobCounter()
+val counter2 = new BasicJobCounter()
+val counter3 = new BasicJobCounter()
+
+// add a new shared, status and event queue
+bus.addToSharedQueue(counter1)
+bus.addToStatusQueue(counter2)
+bus.addToEventLogQueue(counter3)
+
+assert(bus.activeQueues() === Set(SHARED_QUEUE, APP_STATUS_QUEUE, 
EVENT_LOG_QUEUE))
+// check the size of shared queue is 1 as configured
+assert(bus.getQueueCapacity(SHARED_QUEUE) == Some(1))
+// no specific size of status queue is configured,
+// it shoud use the 

[spark] branch master updated (8ae032d -> 10d4ffd)

2019-08-02 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8ae032d  Revert "[SPARK-28582][PYSPARK] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7"
 add 10d4ffd  [SPARK-28532][SPARK-28530][SQL][FOLLOWUP] Inline doc for 
FixedPoint(1) batches "Subquery" and "Join Reorder"

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala | 4 
 1 file changed, 4 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: Revert "[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7"

2019-08-02 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8ae032d  Revert "[SPARK-28582][PYSPARK] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7"
8ae032d is described below

commit 8ae032d78d63f95c1100bc826ae246e4d4fdd34a
Author: Dongjoon Hyun 
AuthorDate: Fri Aug 2 10:14:20 2019 -0700

Revert "[SPARK-28582][PYSPARK] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7"

This reverts commit fbeee0c5bcea32346b2279c5b67044f12e5faf7d.
---
 python/pyspark/tests/test_daemon.py | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/python/pyspark/tests/test_daemon.py 
b/python/pyspark/tests/test_daemon.py
index c987fae..2cdc16c 100644
--- a/python/pyspark/tests/test_daemon.py
+++ b/python/pyspark/tests/test_daemon.py
@@ -47,12 +47,9 @@ class DaemonTests(unittest.TestCase):
 # daemon should accept connections
 self.assertTrue(self.connect(port))
 
-# wait worker process spawned from daemon exit.
-time.sleep(1)
-
 # request shutdown
 terminator(daemon)
-daemon.wait(5)
+time.sleep(1)
 
 # daemon should no longer accept connections
 try:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: Revert "[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7"

2019-08-02 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new fe0f53a  Revert "[SPARK-28582][PYSPARK] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7"
fe0f53a is described below

commit fe0f53ae4e4a71ebb821506af7e7f0ba7690dc53
Author: Dongjoon Hyun 
AuthorDate: Fri Aug 2 10:08:29 2019 -0700

Revert "[SPARK-28582][PYSPARK] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7"

This reverts commit 20e46ef6e3e49e754062717c2cb249c6eb99e86a.
---
 python/pyspark/tests.py | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
index fc0ed41..a2d825b 100644
--- a/python/pyspark/tests.py
+++ b/python/pyspark/tests.py
@@ -1939,12 +1939,9 @@ class DaemonTests(unittest.TestCase):
 # daemon should accept connections
 self.assertTrue(self.connect(port))
 
-# wait worker process spawned from daemon exit.
-time.sleep(1)
-
 # request shutdown
 terminator(daemon)
-daemon.wait(5)
+time.sleep(1)
 
 # daemon should no longer accept connections
 try:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.3 updated: Revert "[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7"

2019-08-02 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new d3a73df  Revert "[SPARK-28582][PYSPARK] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7"
d3a73df is described below

commit d3a73dfcb6fd41c74a71cd21e51a902c2c443148
Author: Dongjoon Hyun 
AuthorDate: Fri Aug 2 10:07:13 2019 -0700

Revert "[SPARK-28582][PYSPARK] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7"

This reverts commit 21ef0fdf7244a169ac0c3e701cb21c35f1038a5d.
---
 python/pyspark/tests.py | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
index 5fc6887..5ba121f 100644
--- a/python/pyspark/tests.py
+++ b/python/pyspark/tests.py
@@ -1862,12 +1862,9 @@ class DaemonTests(unittest.TestCase):
 # daemon should accept connections
 self.assertTrue(self.connect(port))
 
-# wait worker process spawned from daemon exit.
-time.sleep(1)
-
 # request shutdown
 terminator(daemon)
-daemon.wait(5)
+time.sleep(1)
 
 # daemon should no longer accept connections
 try:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [MINOR][DOC][SS] Correct description of minPartitions in Kafka option

2019-08-02 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new dad1cd6  [MINOR][DOC][SS] Correct description of minPartitions in 
Kafka option
dad1cd6 is described below

commit dad1cd691fcb846f99818cab078808e07f560f6e
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Fri Aug 2 09:12:54 2019 -0700

[MINOR][DOC][SS] Correct description of minPartitions in Kafka option

## What changes were proposed in this pull request?

`minPartitions` has been used as a hint and relevant method 
(KafkaOffsetRangeCalculator.getRanges) doesn't guarantee the behavior that 
partitions will be equal or more than given value.


https://github.com/apache/spark/blob/d67b98ea016e9b714bef68feaac108edd08159c9/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala#L32-L46

This patch makes clear the configuration is a hint, and actual partitions 
could be less or more.

## How was this patch tested?

Just a documentation change.

Closes #25332 from 
HeartSaVioR/MINOR-correct-kafka-structured-streaming-doc-minpartition.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 7ffc00ccc37fc94a45b7241bb3c6a17736b55ba3)
Signed-off-by: Dongjoon Hyun 
---
 docs/structured-streaming-kafka-integration.md | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/structured-streaming-kafka-integration.md 
b/docs/structured-streaming-kafka-integration.md
index 680fe78..1a3ee85 100644
--- a/docs/structured-streaming-kafka-integration.md
+++ b/docs/structured-streaming-kafka-integration.md
@@ -379,10 +379,12 @@ The following configurations are optional:
   int
   none
   streaming and batch
-  Minimum number of partitions to read from Kafka.
+  Desired minimum number of partitions to read from Kafka.
   By default, Spark has a 1-1 mapping of topicPartitions to Spark partitions 
consuming from Kafka.
   If you set this option to a value greater than your topicPartitions, Spark 
will divvy up large
-  Kafka partitions to smaller pieces.
+  Kafka partitions to smaller pieces. Please note that this configuration is 
like a `hint`: the
+  number of Spark tasks will be **approximately** `minPartitions`. It can be 
less or more depending on
+  rounding errors or Kafka partitions that didn't receive any new data.
 
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b148bd5 -> 7ffc00c)

2019-08-02 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b148bd5  [SPARK-28519][SQL] Use StrictMath log, pow functions for 
platform independence
 add 7ffc00c  [MINOR][DOC][SS] Correct description of minPartitions in 
Kafka option

No new revisions were added by this update.

Summary of changes:
 docs/structured-streaming-kafka-integration.md | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (6d7a675 -> b148bd5)

2019-08-02 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6d7a675  [SPARK-20604][ML] Allow imputer to handle numeric types
 add b148bd5  [SPARK-28519][SQL] Use StrictMath log, pow functions for 
platform independence

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide-upgrade.md|  2 +
 .../sql/catalyst/expressions/mathExpressions.scala | 64 +++---
 .../expressions/MathExpressionsSuite.scala | 31 ++-
 .../sql-tests/results/pgSQL/float8.sql.out |  2 +-
 .../sql-tests/results/pgSQL/numeric.sql.out|  8 +--
 .../org/apache/spark/sql/MathFunctionsSuite.scala  | 18 +++---
 6 files changed, 64 insertions(+), 61 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (660423d -> 6d7a675)

2019-08-02 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 660423d  [SPARK-23469][ML] HashingTF should use corrected MurmurHash3 
implementation
 add 6d7a675  [SPARK-20604][ML] Allow imputer to handle numeric types

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ml/feature/Imputer.scala  | 10 ++--
 .../org/apache/spark/ml/feature/ImputerSuite.scala | 57 ++
 2 files changed, 64 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-23469][ML] HashingTF should use corrected MurmurHash3 implementation

2019-08-02 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 660423d  [SPARK-23469][ML] HashingTF should use corrected MurmurHash3 
implementation
660423d is described below

commit 660423d71769a42176f7cbe867eb23ec8afe7592
Author: Huaxin Gao 
AuthorDate: Fri Aug 2 10:53:36 2019 -0500

[SPARK-23469][ML] HashingTF should use corrected MurmurHash3 implementation

## What changes were proposed in this pull request?

Update HashingTF to use new implementation of MurmurHash3
Make HashingTF use the old MurmurHash3 when a model from pre 3.0 is loaded

## How was this patch tested?

Change existing unit tests. Also add one unit test to make sure HashingTF 
use the old MurmurHash3 when a model from pre 3.0 is loaded

Closes #25303 from huaxingao/spark-23469.

Authored-by: Huaxin Gao 
Signed-off-by: Sean Owen 
---
 .../org/apache/spark/ml/feature/HashingTF.scala|  28 +++--
 .../hashingTF-pre3.0/metadata/.part-0.crc  | Bin 0 -> 12 bytes
 .../test-data/hashingTF-pre3.0/metadata/_SUCCESS   |   0
 .../test-data/hashingTF-pre3.0/metadata/part-0 |   1 +
 .../apache/spark/ml/feature/HashingTFSuite.scala   |  25 +-
 python/pyspark/ml/feature.py   |   8 +++---
 python/pyspark/ml/tests/test_feature.py|   2 +-
 7 files changed, 51 insertions(+), 13 deletions(-)

diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala
index 27b8bdc..0e6c43a 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala
@@ -26,11 +26,12 @@ import org.apache.spark.ml.linalg.Vectors
 import org.apache.spark.ml.param._
 import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol}
 import org.apache.spark.ml.util._
-import org.apache.spark.mllib.feature.HashingTF.murmur3Hash
+import org.apache.spark.mllib.feature.{HashingTF => OldHashingTF}
 import org.apache.spark.sql.{DataFrame, Dataset}
 import org.apache.spark.sql.functions.{col, udf}
 import org.apache.spark.sql.types.{ArrayType, StructType}
 import org.apache.spark.util.Utils
+import org.apache.spark.util.VersionUtils.majorMinorVersion
 
 /**
  * Maps a sequence of terms to their term frequencies using the hashing trick.
@@ -44,7 +45,7 @@ import org.apache.spark.util.Utils
 class HashingTF @Since("1.4.0") (@Since("1.4.0") override val uid: String)
   extends Transformer with HasInputCol with HasOutputCol with 
DefaultParamsWritable {
 
-  private[this] val hashFunc: Any => Int = murmur3Hash
+  private var hashFunc: Any => Int = FeatureHasher.murmur3Hash
 
   @Since("1.2.0")
   def this() = this(Identifiable.randomUID("hashingTF"))
@@ -142,6 +143,29 @@ class HashingTF @Since("1.4.0") (@Since("1.4.0") override 
val uid: String)
 @Since("1.6.0")
 object HashingTF extends DefaultParamsReadable[HashingTF] {
 
+  private class HashingTFReader extends MLReader[HashingTF] {
+
+private val className = classOf[HashingTF].getName
+
+override def load(path: String): HashingTF = {
+  val metadata = DefaultParamsReader.loadMetadata(path, sc, className)
+  val hashingTF = new HashingTF(metadata.uid)
+  metadata.getAndSetParams(hashingTF)
+
+  // We support loading old `HashingTF` saved by previous Spark versions.
+  // Previous `HashingTF` uses `mllib.feature.HashingTF.murmur3Hash`, but 
new `HashingTF` uses
+  // `ml.Feature.FeatureHasher.murmur3Hash`.
+  val (majorVersion, _) = majorMinorVersion(metadata.sparkVersion)
+  if (majorVersion < 3) {
+hashingTF.hashFunc = OldHashingTF.murmur3Hash
+  }
+  hashingTF
+}
+  }
+
+  @Since("3.0.0")
+  override def read: MLReader[HashingTF] = new HashingTFReader
+
   @Since("1.6.0")
   override def load(path: String): HashingTF = super.load(path)
 }
diff --git 
a/mllib/src/test/resources/test-data/hashingTF-pre3.0/metadata/.part-0.crc 
b/mllib/src/test/resources/test-data/hashingTF-pre3.0/metadata/.part-0.crc
new file mode 100644
index 000..1ac377a
Binary files /dev/null and 
b/mllib/src/test/resources/test-data/hashingTF-pre3.0/metadata/.part-0.crc 
differ
diff --git 
a/mllib/src/test/resources/test-data/hashingTF-pre3.0/metadata/_SUCCESS 
b/mllib/src/test/resources/test-data/hashingTF-pre3.0/metadata/_SUCCESS
new file mode 100644
index 000..e69de29
diff --git 
a/mllib/src/test/resources/test-data/hashingTF-pre3.0/metadata/part-0 
b/mllib/src/test/resources/test-data/hashingTF-pre3.0/metadata/part-0
new file mode 100644
index 000..492a07a
--- /dev/null
+++ b/mllib/src/test/resources/test-data/hashingTF-pre3.0/metadata/part-0
@@ -0,0 +1 @@

[spark] branch master updated: [SPARK-28510][SQL] Implement Spark's own GetFunctionsOperation

2019-08-02 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new efd9299  [SPARK-28510][SQL] Implement Spark's own GetFunctionsOperation
efd9299 is described below

commit efd92993f403fe40b3abd1dac21f8d7c875f407d
Author: Yuming Wang 
AuthorDate: Fri Aug 2 08:50:42 2019 -0700

[SPARK-28510][SQL] Implement Spark's own GetFunctionsOperation

## What changes were proposed in this pull request?

This PR implements Spark's own GetFunctionsOperation which mitigates the 
differences between Spark SQL and Hive UDFs. But our implementation is 
different from Hive's implementation:
- Our implementation always returns results. Hive only returns results when 
[(null == catalogName || "".equals(catalogName)) && (null == schemaName || 
"".equals(schemaName))](https://github.com/apache/hive/blob/rel/release-3.1.1/service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java#L101-L119).
- Our implementation pads the `REMARKS` field with the function usage - 
Hive returns an empty string.
- Our implementation does not support `FUNCTION_TYPE`, but Hive does.

## How was this patch tested?

unit tests

Closes #25252 from wangyum/SPARK-28510.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 .../thriftserver/SparkGetFunctionsOperation.scala  | 115 +
 .../server/SparkSQLOperationManager.scala  |  15 +++
 .../thriftserver/SparkMetadataOperationSuite.scala |  43 +++-
 .../cli/operation/GetFunctionsOperation.java   |   2 +-
 .../cli/operation/GetFunctionsOperation.java   |   2 +-
 5 files changed, 174 insertions(+), 3 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala
new file mode 100644
index 000..462e5730
--- /dev/null
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.thriftserver
+
+import java.sql.DatabaseMetaData
+import java.util.UUID
+
+import scala.collection.JavaConverters.seqAsJavaListConverter
+
+import 
org.apache.hadoop.hive.ql.security.authorization.plugin.{HiveOperationType, 
HivePrivilegeObjectUtils}
+import org.apache.hive.service.cli._
+import org.apache.hive.service.cli.operation.GetFunctionsOperation
+import 
org.apache.hive.service.cli.operation.MetadataOperation.DEFAULT_HIVE_CATALOG
+import org.apache.hive.service.cli.session.HiveSession
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.util.{Utils => SparkUtils}
+
+/**
+ * Spark's own GetFunctionsOperation
+ *
+ * @param sqlContext SQLContext to use
+ * @param parentSession a HiveSession from SessionManager
+ * @param catalogName catalog name. null if not applicable
+ * @param schemaName database name, null or a concrete database name
+ * @param functionName function name pattern
+ */
+private[hive] class SparkGetFunctionsOperation(
+sqlContext: SQLContext,
+parentSession: HiveSession,
+catalogName: String,
+schemaName: String,
+functionName: String)
+  extends GetFunctionsOperation(parentSession, catalogName, schemaName, 
functionName) with Logging {
+
+  private var statementId: String = _
+
+  override def close(): Unit = {
+super.close()
+HiveThriftServer2.listener.onOperationClosed(statementId)
+  }
+
+  override def runInternal(): Unit = {
+statementId = UUID.randomUUID().toString
+// Do not change cmdStr. It's used for Hive auditing and authorization.
+val cmdStr = s"catalog : $catalogName, schemaPattern : $schemaName"
+val logMsg = s"Listing functions '$cmdStr, functionName : $functionName'"
+logInfo(s"$logMsg with $statementId")
+setState(OperationState.RUNNING)
+// Always use the latest class loader 

[spark] branch master updated (fbeee0c -> 37243e1)

2019-08-02 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fbeee0c  [SPARK-28582][PYSPARK] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7
 add 37243e1  [SPARK-28579][ML] MaxAbsScaler avoids conversion to 
breeze.vector

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ml/feature/MaxAbsScaler.scala  | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.3 updated: [SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7

2019-08-02 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new 21ef0fd  [SPARK-28582][PYSPARK] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7
21ef0fd is described below

commit 21ef0fdf7244a169ac0c3e701cb21c35f1038a5d
Author: WeichenXu 
AuthorDate: Fri Aug 2 22:07:06 2019 +0900

[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which 
fail on Python 3.7

Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7. I 
add a sleep after the test connection to daemon.

Run test
```
python/run-tests --python-executables=python3.7 --testname 
"pyspark.tests.test_daemon DaemonTests"
```
**Before**
Fail on test "test_termination_sigterm". And we can see daemon process do 
not exit.
**After**
Test passed

Closes #25315 from WeichenXu123/fix_py37_daemon.

Authored-by: WeichenXu 
Signed-off-by: HyukjinKwon 
(cherry picked from commit fbeee0c5bcea32346b2279c5b67044f12e5faf7d)
Signed-off-by: HyukjinKwon 
---
 python/pyspark/tests.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
index 5ba121f..5fc6887 100644
--- a/python/pyspark/tests.py
+++ b/python/pyspark/tests.py
@@ -1862,9 +1862,12 @@ class DaemonTests(unittest.TestCase):
 # daemon should accept connections
 self.assertTrue(self.connect(port))
 
+# wait worker process spawned from daemon exit.
+time.sleep(1)
+
 # request shutdown
 terminator(daemon)
-time.sleep(1)
+daemon.wait(5)
 
 # daemon should no longer accept connections
 try:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7

2019-08-02 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 20e46ef  [SPARK-28582][PYSPARK] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7
20e46ef is described below

commit 20e46ef6e3e49e754062717c2cb249c6eb99e86a
Author: WeichenXu 
AuthorDate: Fri Aug 2 22:07:06 2019 +0900

[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which 
fail on Python 3.7

Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7. I 
add a sleep after the test connection to daemon.

Run test
```
python/run-tests --python-executables=python3.7 --testname 
"pyspark.tests.test_daemon DaemonTests"
```
**Before**
Fail on test "test_termination_sigterm". And we can see daemon process do 
not exit.
**After**
Test passed

Closes #25315 from WeichenXu123/fix_py37_daemon.

Authored-by: WeichenXu 
Signed-off-by: HyukjinKwon 
(cherry picked from commit fbeee0c5bcea32346b2279c5b67044f12e5faf7d)
Signed-off-by: HyukjinKwon 
---
 python/pyspark/tests.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
index a2d825b..fc0ed41 100644
--- a/python/pyspark/tests.py
+++ b/python/pyspark/tests.py
@@ -1939,9 +1939,12 @@ class DaemonTests(unittest.TestCase):
 # daemon should accept connections
 self.assertTrue(self.connect(port))
 
+# wait worker process spawned from daemon exit.
+time.sleep(1)
+
 # request shutdown
 terminator(daemon)
-time.sleep(1)
+daemon.wait(5)
 
 # daemon should no longer accept connections
 try:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: Revert "[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7"

2019-08-02 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new a065a50  Revert "[SPARK-28582][PYSPARK] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7"
a065a50 is described below

commit a065a503bcf1ec5b7d49c575af7bc6867c734d90
Author: HyukjinKwon 
AuthorDate: Fri Aug 2 22:12:04 2019 +0900

Revert "[SPARK-28582][PYSPARK] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7"

This reverts commit dc09a02c142d3787e728c8b25eb8417649d98e9f.
---
 python/pyspark/tests.py | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
index fef2959..a2d825b 100644
--- a/python/pyspark/tests.py
+++ b/python/pyspark/tests.py
@@ -1916,7 +1916,7 @@ class OutputFormatTests(ReusedPySparkTestCase):
 
 class DaemonTests(unittest.TestCase):
 def connect(self, port):
-from socket import socket, AF_INET, SOCK_STREAM# request shutdown
+from socket import socket, AF_INET, SOCK_STREAM
 sock = socket(AF_INET, SOCK_STREAM)
 sock.connect(('127.0.0.1', port))
 # send a split index of -1 to shutdown the worker
@@ -1939,12 +1939,9 @@ class DaemonTests(unittest.TestCase):
 # daemon should accept connections
 self.assertTrue(self.connect(port))
 
-# wait worker process spawned from daemon exit.
-time.sleep(1)
-
 # request shutdown
 terminator(daemon)
-daemon.wait(5)
+time.sleep(1)
 
 # daemon should no longer accept connections
 try:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7

2019-08-02 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new dc09a02  [SPARK-28582][PYSPARK] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7
dc09a02 is described below

commit dc09a02c142d3787e728c8b25eb8417649d98e9f
Author: WeichenXu 
AuthorDate: Fri Aug 2 22:07:06 2019 +0900

[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which 
fail on Python 3.7

Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7. I 
add a sleep after the test connection to daemon.

Run test
```
python/run-tests --python-executables=python3.7 --testname 
"pyspark.tests.test_daemon DaemonTests"
```
**Before**
Fail on test "test_termination_sigterm". And we can see daemon process do 
not exit.
**After**
Test passed

Closes #25315 from WeichenXu123/fix_py37_daemon.

Authored-by: WeichenXu 
Signed-off-by: HyukjinKwon 
(cherry picked from commit fbeee0c5bcea32346b2279c5b67044f12e5faf7d)
Signed-off-by: HyukjinKwon 
---
 python/pyspark/tests.py | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
index a2d825b..fef2959 100644
--- a/python/pyspark/tests.py
+++ b/python/pyspark/tests.py
@@ -1916,7 +1916,7 @@ class OutputFormatTests(ReusedPySparkTestCase):
 
 class DaemonTests(unittest.TestCase):
 def connect(self, port):
-from socket import socket, AF_INET, SOCK_STREAM
+from socket import socket, AF_INET, SOCK_STREAM# request shutdown
 sock = socket(AF_INET, SOCK_STREAM)
 sock.connect(('127.0.0.1', port))
 # send a split index of -1 to shutdown the worker
@@ -1939,9 +1939,12 @@ class DaemonTests(unittest.TestCase):
 # daemon should accept connections
 self.assertTrue(self.connect(port))
 
+# wait worker process spawned from daemon exit.
+time.sleep(1)
+
 # request shutdown
 terminator(daemon)
-time.sleep(1)
+daemon.wait(5)
 
 # daemon should no longer accept connections
 try:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7

2019-08-02 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fbeee0c  [SPARK-28582][PYSPARK] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7
fbeee0c is described below

commit fbeee0c5bcea32346b2279c5b67044f12e5faf7d
Author: WeichenXu 
AuthorDate: Fri Aug 2 22:07:06 2019 +0900

[SPARK-28582][PYSPARK] Fix flaky test DaemonTests.do_termination_test which 
fail on Python 3.7

## What changes were proposed in this pull request?

Fix flaky test DaemonTests.do_termination_test which fail on Python 3.7. I 
add a sleep after the test connection to daemon.

## How was this patch tested?

Run test
```
python/run-tests --python-executables=python3.7 --testname 
"pyspark.tests.test_daemon DaemonTests"
```
**Before**
Fail on test "test_termination_sigterm". And we can see daemon process do 
not exit.
**After**
Test passed

Closes #25315 from WeichenXu123/fix_py37_daemon.

Authored-by: WeichenXu 
Signed-off-by: HyukjinKwon 
---
 python/pyspark/tests/test_daemon.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/tests/test_daemon.py 
b/python/pyspark/tests/test_daemon.py
index 2cdc16c..c987fae 100644
--- a/python/pyspark/tests/test_daemon.py
+++ b/python/pyspark/tests/test_daemon.py
@@ -47,9 +47,12 @@ class DaemonTests(unittest.TestCase):
 # daemon should accept connections
 self.assertTrue(self.connect(port))
 
+# wait worker process spawned from daemon exit.
+time.sleep(1)
+
 # request shutdown
 terminator(daemon)
-time.sleep(1)
+daemon.wait(5)
 
 # daemon should no longer accept connections
 try:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-28445][SQL][PYTHON] Fix error when PythonUDF is used in both group by and aggregate expression

2019-08-02 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 77c7e91  [SPARK-28445][SQL][PYTHON] Fix error when PythonUDF is used 
in both group by and aggregate expression
77c7e91 is described below

commit 77c7e91e029a9a70678435acb141154f2f51882e
Author: Liang-Chi Hsieh 
AuthorDate: Fri Aug 2 19:47:29 2019 +0900

[SPARK-28445][SQL][PYTHON] Fix error when PythonUDF is used in both group 
by and aggregate expression

## What changes were proposed in this pull request?

When PythonUDF is used in group by, and it is also in aggregate expression, 
like

```
SELECT pyUDF(a + 1), COUNT(b) FROM testData GROUP BY pyUDF(a + 1)
```

It causes analysis exception in `CheckAnalysis`, like
```
org.apache.spark.sql.AnalysisException: expression 'testdata.`a`' is 
neither present in the group by, nor is it an aggregate function.
```

First, `CheckAnalysis` can't check semantic equality between PythonUDFs.
Second, even we make it possible, runtime exception will be thrown

```
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
attribute, tree: pythonUDF1#8615
...
Cause: java.lang.RuntimeException: Couldn't find pythonUDF1#8615 in 
[cast(pythonUDF0#8614 as int)#8617,count(b#8599)#8607L]
```

The cause is, `ExtractPythonUDFs` extracts both PythonUDFs in group by and 
aggregate expression. The PythonUDFs are two different aliases now in the 
logical aggregate. In runtime, we can't bind the resulting expression in 
aggregate to its grouping and aggregate attributes.

This patch proposes a rule `ExtractGroupingPythonUDFFromAggregate` to 
extract PythonUDFs in group by and evaluate them before aggregate. We replace 
the group by PythonUDF in aggregate expression with aliased result.

The query plan of query `SELECT pyUDF(a + 1), pyUDF(COUNT(b)) FROM testData 
GROUP BY pyUDF(a + 1)`, like

```
== Optimized Logical Plan ==
Project [CAST(pyUDF(cast((a + 1) as string)) AS INT)#8608, 
cast(pythonUDF0#8616 as bigint) AS CAST(pyUDF(cast(count(b) as string)) AS 
BIGINT)#8610L]
+- BatchEvalPython [pyUDF(cast(agg#8613L as string))], [pythonUDF0#8616]
   +- Aggregate [cast(groupingPythonUDF#8614 as int)], 
[cast(groupingPythonUDF#8614 as int) AS CAST(pyUDF(cast((a + 1) as string)) AS 
INT)#8608, count(b#8599) AS agg#8613L]
  +- Project [pythonUDF0#8615 AS groupingPythonUDF#8614, b#8599]
 +- BatchEvalPython [pyUDF(cast((a#8598 + 1) as string))], 
[pythonUDF0#8615]
+- LocalRelation [a#8598, b#8599]

== Physical Plan ==
*(3) Project [CAST(pyUDF(cast((a + 1) as string)) AS INT)#8608, 
cast(pythonUDF0#8616 as bigint) AS CAST(pyUDF(cast(count(b) as string)) AS 
BIGINT)#8610L]
+- BatchEvalPython [pyUDF(cast(agg#8613L as string))], [pythonUDF0#8616]
   +- *(2) HashAggregate(keys=[cast(groupingPythonUDF#8614 as int)#8617], 
functions=[count(b#8599)], output=[CAST(pyUDF(cast((a + 1) as string)) AS 
INT)#8608, agg#8613L])
  +- Exchange hashpartitioning(cast(groupingPythonUDF#8614 as 
int)#8617, 5), true
 +- *(1) HashAggregate(keys=[cast(groupingPythonUDF#8614 as int) AS 
cast(groupingPythonUDF#8614 as int)#8617], functions=[partial_count(b#8599)], 
output=[cast(groupingPythonUDF#8614 as int)#8617, count#8619L])
+- *(1) Project [pythonUDF0#8615 AS groupingPythonUDF#8614, 
b#8599]
   +- BatchEvalPython [pyUDF(cast((a#8598 + 1) as string))], 
[pythonUDF0#8615]
  +- LocalTableScan [a#8598, b#8599]
```

## How was this patch tested?

Added tests.

Closes #25215 from viirya/SPARK-28445.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: HyukjinKwon 
---
 .../spark/sql/catalyst/expressions/PythonUDF.scala |  6 ++
 .../spark/sql/execution/SparkOptimizer.scala   |  6 +-
 .../sql/execution/python/ExtractPythonUDFs.scala   | 63 +++
 .../sql/execution/python/PythonUDFSuite.scala  | 71 ++
 4 files changed, 144 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala
index 690969e..da2e182 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala
@@ -67,4 +67,10 @@ case class PythonUDF(
 exprId = resultId)
 
   override def nullable: Boolean = true
+
+  override lazy val canonicalized: Expression = {
+val canonicalizedChildren = children.map(_.canonicalized)
+// `resultId` can be seen as cosmetic variation 

[spark] branch master updated: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink

2019-08-02 Thread jshao
This is an automated email from the ASF dual-hosted git repository.

jshao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6d32dee  [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink
6d32dee is described below

commit 6d32deeecc5a80230158e12982a5c1ea3f70d89d
Author: Nick Karpov 
AuthorDate: Fri Aug 2 17:50:15 2019 +0800

[SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink

## What changes were proposed in this pull request?

Today all registered metric sources are reported to GraphiteSink with no 
filtering mechanism, although the codahale project does support it.

GraphiteReporter (ScheduledReporter) from the codahale project requires you 
implement and supply the MetricFilter interface (there is only a single 
implementation by default in the codahale project, MetricFilter.ALL).

Propose to add an additional regex config to match and filter metrics to 
the GraphiteSink

## How was this patch tested?

Included a GraphiteSinkSuite that tests:

1. Absence of regex filter (existing default behavior maintained)
2. Presence of `regex=` correctly filters metric keys

Closes #25232 from nkarpov/graphite_regex.

Authored-by: Nick Karpov 
Signed-off-by: jerryshao 
---
 conf/metrics.properties.template   |  1 +
 .../apache/spark/metrics/sink/GraphiteSink.scala   | 13 +++-
 .../spark/metrics/sink/GraphiteSinkSuite.scala | 84 ++
 docs/monitoring.md |  1 +
 4 files changed, 98 insertions(+), 1 deletion(-)

diff --git a/conf/metrics.properties.template b/conf/metrics.properties.template
index 23407e1..da0b06d 100644
--- a/conf/metrics.properties.template
+++ b/conf/metrics.properties.template
@@ -121,6 +121,7 @@
 #   unit  seconds   Unit of the poll period
 #   prefixEMPTY STRING  Prefix to prepend to every metric's name
 #   protocol  tcp   Protocol ("tcp" or "udp") to use
+#   regex NONE  Optional filter to send only metrics matching this 
regex string
 
 # org.apache.spark.metrics.sink.StatsdSink
 #   Name: Default:  Description:
diff --git 
a/core/src/main/scala/org/apache/spark/metrics/sink/GraphiteSink.scala 
b/core/src/main/scala/org/apache/spark/metrics/sink/GraphiteSink.scala
index 21b4dfb..05d553e 100644
--- a/core/src/main/scala/org/apache/spark/metrics/sink/GraphiteSink.scala
+++ b/core/src/main/scala/org/apache/spark/metrics/sink/GraphiteSink.scala
@@ -20,7 +20,7 @@ package org.apache.spark.metrics.sink
 import java.util.{Locale, Properties}
 import java.util.concurrent.TimeUnit
 
-import com.codahale.metrics.MetricRegistry
+import com.codahale.metrics.{Metric, MetricFilter, MetricRegistry}
 import com.codahale.metrics.graphite.{Graphite, GraphiteReporter, GraphiteUDP}
 
 import org.apache.spark.SecurityManager
@@ -38,6 +38,7 @@ private[spark] class GraphiteSink(val property: Properties, 
val registry: Metric
   val GRAPHITE_KEY_UNIT = "unit"
   val GRAPHITE_KEY_PREFIX = "prefix"
   val GRAPHITE_KEY_PROTOCOL = "protocol"
+  val GRAPHITE_KEY_REGEX = "regex"
 
   def propertyToOption(prop: String): Option[String] = 
Option(property.getProperty(prop))
 
@@ -72,10 +73,20 @@ private[spark] class GraphiteSink(val property: Properties, 
val registry: Metric
 case Some(p) => throw new Exception(s"Invalid Graphite protocol: $p")
   }
 
+  val filter = propertyToOption(GRAPHITE_KEY_REGEX) match {
+case Some(pattern) => new MetricFilter() {
+  override def matches(name: String, metric: Metric): Boolean = {
+pattern.r.findFirstMatchIn(name).isDefined
+  }
+}
+case None => MetricFilter.ALL
+  }
+
   val reporter: GraphiteReporter = GraphiteReporter.forRegistry(registry)
   .convertDurationsTo(TimeUnit.MILLISECONDS)
   .convertRatesTo(TimeUnit.SECONDS)
   .prefixedWith(prefix)
+  .filter(filter)
   .build(graphite)
 
   override def start() {
diff --git 
a/core/src/test/scala/org/apache/spark/metrics/sink/GraphiteSinkSuite.scala 
b/core/src/test/scala/org/apache/spark/metrics/sink/GraphiteSinkSuite.scala
new file mode 100644
index 000..2369218
--- /dev/null
+++ b/core/src/test/scala/org/apache/spark/metrics/sink/GraphiteSinkSuite.scala
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License