date:20200724

[spark] branch master updated (b890fdc -> 8e36a8f)

2020-07-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b890fdc  [SPARK-32387][SS] Extract UninterruptibleThread runner logic 
from KafkaOffsetReader
 add 8e36a8f  [SPARK-32419][PYTHON][BUILD] Avoid using subshell for Conda 
env (de)activation in pip packaging test

No new revisions were added by this update.

Summary of changes:
 dev/run-pip-tests | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b890fdc -> 8e36a8f)

2020-07-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b890fdc  [SPARK-32387][SS] Extract UninterruptibleThread runner logic 
from KafkaOffsetReader
 add 8e36a8f  [SPARK-32419][PYTHON][BUILD] Avoid using subshell for Conda 
env (de)activation in pip packaging test

No new revisions were added by this update.

Summary of changes:
 dev/run-pip-tests | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b890fdc -> 8e36a8f)

2020-07-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b890fdc  [SPARK-32387][SS] Extract UninterruptibleThread runner logic 
from KafkaOffsetReader
 add 8e36a8f  [SPARK-32419][PYTHON][BUILD] Avoid using subshell for Conda 
env (de)activation in pip packaging test

No new revisions were added by this update.

Summary of changes:
 dev/run-pip-tests | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b890fdc -> 8e36a8f)

2020-07-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b890fdc  [SPARK-32387][SS] Extract UninterruptibleThread runner logic 
from KafkaOffsetReader
 add 8e36a8f  [SPARK-32419][PYTHON][BUILD] Avoid using subshell for Conda 
env (de)activation in pip packaging test

No new revisions were added by this update.

Summary of changes:
 dev/run-pip-tests | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-32419][PYTHON][BUILD] Avoid using subshell for Conda env (de)activation in pip packaging test

2020-07-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8e36a8f  [SPARK-32419][PYTHON][BUILD] Avoid using subshell for Conda 
env (de)activation in pip packaging test
8e36a8f is described below

commit 8e36a8f33fb9d20c481e9eeb0ea8155aa1569439
Author: HyukjinKwon 
AuthorDate: Sat Jul 25 13:09:23 2020 +0900

[SPARK-32419][PYTHON][BUILD] Avoid using subshell for Conda env 
(de)activation in pip packaging test

### What changes were proposed in this pull request?

This PR proposes to avoid using subshell when it activates Conda 
environment. Looks like it ends up with activating the env within the subshell 
even if you use `conda` command.

### Why are the changes needed?

If you take a close look for GitHub Actions log:

```
 Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9
 Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB)
Using legacy setup.py install for pyspark, since package 'wheel' is not 
installed.
Installing collected packages: py4j, pyspark
 Running setup.py install for pyspark: started
 Running setup.py install for pyspark: finished with status 'done'
Successfully installed py4j-0.10.9 pyspark-3.1.0.dev0

...

Installing dist into virtual env
Obtaining file:///home/runner/work/spark/spark/python
Collecting py4j==0.10.9
 Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB)
Installing collected packages: py4j, pyspark
 Attempting uninstall: py4j
 Found existing installation: py4j 0.10.9
 Uninstalling py4j-0.10.9:
 Successfully uninstalled py4j-0.10.9
 Attempting uninstall: pyspark
 Found existing installation: pyspark 3.1.0.dev0
 Uninstalling pyspark-3.1.0.dev0:
 Successfully uninstalled pyspark-3.1.0.dev0
 Running setup.py develop for pyspark
Successfully installed py4j-0.10.9 pyspark
```

It looks not properly using Conda as it removes the previously installed 
one when it reinstalls again.
We should ideally test it with Conda environment as it's intended.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

GitHub Actions will test. I also manually tested in my local.

Closes #29212 from HyukjinKwon/SPARK-32419.

Authored-by: HyukjinKwon 
Signed-off-by: HyukjinKwon 
---
 dev/run-pip-tests | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/run-pip-tests b/dev/run-pip-tests
index be96ed9..b322d3f 100755
--- a/dev/run-pip-tests
+++ b/dev/run-pip-tests
@@ -85,7 +85,7 @@ for python in "${PYTHON_EXECS[@]}"; do
 source "$CONDA_PREFIX/etc/profile.d/conda.sh"
   fi
   conda create -y -p "$VIRTUALENV_PATH" python=$python numpy pandas pip 
setuptools
-  source activate "$VIRTUALENV_PATH" || (echo "Falling back to 'conda 
activate'" && conda activate "$VIRTUALENV_PATH")
+  source activate "$VIRTUALENV_PATH" || conda activate "$VIRTUALENV_PATH"
 else
   mkdir -p "$VIRTUALENV_PATH"
   virtualenv --python=$python "$VIRTUALENV_PATH"
@@ -128,7 +128,7 @@ for python in "${PYTHON_EXECS[@]}"; do
 
 # conda / virtualenv environments need to be deactivated differently
 if [ -n "$USE_CONDA" ]; then
-  source deactivate || (echo "Falling back to 'conda deactivate'" && conda 
deactivate)
+  source deactivate || conda deactivate
 else
   deactivate
 fi


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e6ef27b -> b890fdc)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e6ef27b  [SPARK-32287][TESTS] Flaky Test: 
ExecutorAllocationManagerSuite.add executors default profile
 add b890fdc  [SPARK-32387][SS] Extract UninterruptibleThread runner logic 
from KafkaOffsetReader

No new revisions were added by this update.

Summary of changes:
 .../spark/util/UninterruptibleThreadRunner.scala   | 55 +++
 .../util/UninterruptibleThreadRunnerSuite.scala| 64 ++
 .../spark/sql/kafka010/KafkaOffsetReader.scala | 46 
 3 files changed, 129 insertions(+), 36 deletions(-)
 create mode 100644 
core/src/main/scala/org/apache/spark/util/UninterruptibleThreadRunner.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/util/UninterruptibleThreadRunnerSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e6ef27b -> b890fdc)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e6ef27b  [SPARK-32287][TESTS] Flaky Test: 
ExecutorAllocationManagerSuite.add executors default profile
 add b890fdc  [SPARK-32387][SS] Extract UninterruptibleThread runner logic 
from KafkaOffsetReader

No new revisions were added by this update.

Summary of changes:
 .../spark/util/UninterruptibleThreadRunner.scala   | 55 +++
 .../util/UninterruptibleThreadRunnerSuite.scala| 64 ++
 .../spark/sql/kafka010/KafkaOffsetReader.scala | 46 
 3 files changed, 129 insertions(+), 36 deletions(-)
 create mode 100644 
core/src/main/scala/org/apache/spark/util/UninterruptibleThreadRunner.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/util/UninterruptibleThreadRunnerSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e6ef27b -> b890fdc)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e6ef27b  [SPARK-32287][TESTS] Flaky Test: 
ExecutorAllocationManagerSuite.add executors default profile
 add b890fdc  [SPARK-32387][SS] Extract UninterruptibleThread runner logic 
from KafkaOffsetReader

No new revisions were added by this update.

Summary of changes:
 .../spark/util/UninterruptibleThreadRunner.scala   | 55 +++
 .../util/UninterruptibleThreadRunnerSuite.scala| 64 ++
 .../spark/sql/kafka010/KafkaOffsetReader.scala | 46 
 3 files changed, 129 insertions(+), 36 deletions(-)
 create mode 100644 
core/src/main/scala/org/apache/spark/util/UninterruptibleThreadRunner.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/util/UninterruptibleThreadRunnerSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e6ef27b -> b890fdc)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e6ef27b  [SPARK-32287][TESTS] Flaky Test: 
ExecutorAllocationManagerSuite.add executors default profile
 add b890fdc  [SPARK-32387][SS] Extract UninterruptibleThread runner logic 
from KafkaOffsetReader

No new revisions were added by this update.

Summary of changes:
 .../spark/util/UninterruptibleThreadRunner.scala   | 55 +++
 .../util/UninterruptibleThreadRunnerSuite.scala| 64 ++
 .../spark/sql/kafka010/KafkaOffsetReader.scala | 46 
 3 files changed, 129 insertions(+), 36 deletions(-)
 create mode 100644 
core/src/main/scala/org/apache/spark/util/UninterruptibleThreadRunner.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/util/UninterruptibleThreadRunnerSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e6ef27b -> b890fdc)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e6ef27b  [SPARK-32287][TESTS] Flaky Test: 
ExecutorAllocationManagerSuite.add executors default profile
 add b890fdc  [SPARK-32387][SS] Extract UninterruptibleThread runner logic 
from KafkaOffsetReader

No new revisions were added by this update.

Summary of changes:
 .../spark/util/UninterruptibleThreadRunner.scala   | 55 +++
 .../util/UninterruptibleThreadRunnerSuite.scala| 64 ++
 .../spark/sql/kafka010/KafkaOffsetReader.scala | 46 
 3 files changed, 129 insertions(+), 36 deletions(-)
 create mode 100644 
core/src/main/scala/org/apache/spark/util/UninterruptibleThreadRunner.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/util/UninterruptibleThreadRunnerSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32287][TESTS] Flaky Test: ExecutorAllocationManagerSuite.add executors default profile

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ab14d05  [SPARK-32287][TESTS] Flaky Test: 
ExecutorAllocationManagerSuite.add executors default profile
ab14d05 is described below

commit ab14d05624f6cc88b02e29dd5ae8d302ffcfd09d
Author: Thomas Graves 
AuthorDate: Fri Jul 24 11:12:28 2020 -0700

[SPARK-32287][TESTS] Flaky Test: ExecutorAllocationManagerSuite.add 
executors default profile

I wasn't able to reproduce the failure but the best I can tell is that the 
allocation manager timer triggers and call doRequest. The timeout is 10s so try 
to increase that to 30seconds.

test failure

no

unit test

Closes #29225 from tgravescs/SPARK-32287.

Authored-by: Thomas Graves 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit e6ef27be52dcd14dc94384c2ada85861be44d843)
Signed-off-by: Dongjoon Hyun 
---
 .../test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala 
b/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
index 8d95849..0b19146 100644
--- a/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
@@ -1142,7 +1142,7 @@ class ExecutorAllocationManagerSuite extends 
SparkFunSuite {
   .set(config.DYN_ALLOCATION_TESTING, true)
   // SPARK-22864: effectively disable the allocation schedule by setting 
the period to a
   // really long value.
-  .set(TEST_SCHEDULE_INTERVAL, 1L)
+  .set(TEST_SCHEDULE_INTERVAL, 3L)
   }
 
   private def createManager(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (64a01c0 -> e6ef27b)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 64a01c0  [SPARK-32430][SQL] Extend SparkSessionExtensions to inject 
rules into AQE query stage preparation
 add e6ef27b  [SPARK-32287][TESTS] Flaky Test: 
ExecutorAllocationManagerSuite.add executors default profile

No new revisions were added by this update.

Summary of changes:
 .../test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32287][TESTS] Flaky Test: ExecutorAllocationManagerSuite.add executors default profile

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ab14d05  [SPARK-32287][TESTS] Flaky Test: 
ExecutorAllocationManagerSuite.add executors default profile
ab14d05 is described below

commit ab14d05624f6cc88b02e29dd5ae8d302ffcfd09d
Author: Thomas Graves 
AuthorDate: Fri Jul 24 11:12:28 2020 -0700

[SPARK-32287][TESTS] Flaky Test: ExecutorAllocationManagerSuite.add 
executors default profile

I wasn't able to reproduce the failure but the best I can tell is that the 
allocation manager timer triggers and call doRequest. The timeout is 10s so try 
to increase that to 30seconds.

test failure

no

unit test

Closes #29225 from tgravescs/SPARK-32287.

Authored-by: Thomas Graves 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit e6ef27be52dcd14dc94384c2ada85861be44d843)
Signed-off-by: Dongjoon Hyun 
---
 .../test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala 
b/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
index 8d95849..0b19146 100644
--- a/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
@@ -1142,7 +1142,7 @@ class ExecutorAllocationManagerSuite extends 
SparkFunSuite {
   .set(config.DYN_ALLOCATION_TESTING, true)
   // SPARK-22864: effectively disable the allocation schedule by setting 
the period to a
   // really long value.
-  .set(TEST_SCHEDULE_INTERVAL, 1L)
+  .set(TEST_SCHEDULE_INTERVAL, 3L)
   }
 
   private def createManager(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (64a01c0 -> e6ef27b)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 64a01c0  [SPARK-32430][SQL] Extend SparkSessionExtensions to inject 
rules into AQE query stage preparation
 add e6ef27b  [SPARK-32287][TESTS] Flaky Test: 
ExecutorAllocationManagerSuite.add executors default profile

No new revisions were added by this update.

Summary of changes:
 .../test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32287][TESTS] Flaky Test: ExecutorAllocationManagerSuite.add executors default profile

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ab14d05  [SPARK-32287][TESTS] Flaky Test: 
ExecutorAllocationManagerSuite.add executors default profile
ab14d05 is described below

commit ab14d05624f6cc88b02e29dd5ae8d302ffcfd09d
Author: Thomas Graves 
AuthorDate: Fri Jul 24 11:12:28 2020 -0700

[SPARK-32287][TESTS] Flaky Test: ExecutorAllocationManagerSuite.add 
executors default profile

I wasn't able to reproduce the failure but the best I can tell is that the 
allocation manager timer triggers and call doRequest. The timeout is 10s so try 
to increase that to 30seconds.

test failure

no

unit test

Closes #29225 from tgravescs/SPARK-32287.

Authored-by: Thomas Graves 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit e6ef27be52dcd14dc94384c2ada85861be44d843)
Signed-off-by: Dongjoon Hyun 
---
 .../test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala 
b/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
index 8d95849..0b19146 100644
--- a/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
@@ -1142,7 +1142,7 @@ class ExecutorAllocationManagerSuite extends 
SparkFunSuite {
   .set(config.DYN_ALLOCATION_TESTING, true)
   // SPARK-22864: effectively disable the allocation schedule by setting 
the period to a
   // really long value.
-  .set(TEST_SCHEDULE_INTERVAL, 1L)
+  .set(TEST_SCHEDULE_INTERVAL, 3L)
   }
 
   private def createManager(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (64a01c0 -> e6ef27b)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 64a01c0  [SPARK-32430][SQL] Extend SparkSessionExtensions to inject 
rules into AQE query stage preparation
 add e6ef27b  [SPARK-32287][TESTS] Flaky Test: 
ExecutorAllocationManagerSuite.add executors default profile

No new revisions were added by this update.

Summary of changes:
 .../test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32287][TESTS] Flaky Test: ExecutorAllocationManagerSuite.add executors default profile

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ab14d05  [SPARK-32287][TESTS] Flaky Test: 
ExecutorAllocationManagerSuite.add executors default profile
ab14d05 is described below

commit ab14d05624f6cc88b02e29dd5ae8d302ffcfd09d
Author: Thomas Graves 
AuthorDate: Fri Jul 24 11:12:28 2020 -0700

[SPARK-32287][TESTS] Flaky Test: ExecutorAllocationManagerSuite.add 
executors default profile

I wasn't able to reproduce the failure but the best I can tell is that the 
allocation manager timer triggers and call doRequest. The timeout is 10s so try 
to increase that to 30seconds.

test failure

no

unit test

Closes #29225 from tgravescs/SPARK-32287.

Authored-by: Thomas Graves 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit e6ef27be52dcd14dc94384c2ada85861be44d843)
Signed-off-by: Dongjoon Hyun 
---
 .../test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala 
b/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
index 8d95849..0b19146 100644
--- a/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
@@ -1142,7 +1142,7 @@ class ExecutorAllocationManagerSuite extends 
SparkFunSuite {
   .set(config.DYN_ALLOCATION_TESTING, true)
   // SPARK-22864: effectively disable the allocation schedule by setting 
the period to a
   // really long value.
-  .set(TEST_SCHEDULE_INTERVAL, 1L)
+  .set(TEST_SCHEDULE_INTERVAL, 3L)
   }
 
   private def createManager(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (64a01c0 -> e6ef27b)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 64a01c0  [SPARK-32430][SQL] Extend SparkSessionExtensions to inject 
rules into AQE query stage preparation
 add e6ef27b  [SPARK-32287][TESTS] Flaky Test: 
ExecutorAllocationManagerSuite.add executors default profile

No new revisions were added by this update.

Summary of changes:
 .../test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32287][TESTS] Flaky Test: ExecutorAllocationManagerSuite.add executors default profile

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ab14d05  [SPARK-32287][TESTS] Flaky Test: 
ExecutorAllocationManagerSuite.add executors default profile
ab14d05 is described below

commit ab14d05624f6cc88b02e29dd5ae8d302ffcfd09d
Author: Thomas Graves 
AuthorDate: Fri Jul 24 11:12:28 2020 -0700

[SPARK-32287][TESTS] Flaky Test: ExecutorAllocationManagerSuite.add 
executors default profile

I wasn't able to reproduce the failure but the best I can tell is that the 
allocation manager timer triggers and call doRequest. The timeout is 10s so try 
to increase that to 30seconds.

test failure

no

unit test

Closes #29225 from tgravescs/SPARK-32287.

Authored-by: Thomas Graves 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit e6ef27be52dcd14dc94384c2ada85861be44d843)
Signed-off-by: Dongjoon Hyun 
---
 .../test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala 
b/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
index 8d95849..0b19146 100644
--- a/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
@@ -1142,7 +1142,7 @@ class ExecutorAllocationManagerSuite extends 
SparkFunSuite {
   .set(config.DYN_ALLOCATION_TESTING, true)
   // SPARK-22864: effectively disable the allocation schedule by setting 
the period to a
   // really long value.
-  .set(TEST_SCHEDULE_INTERVAL, 1L)
+  .set(TEST_SCHEDULE_INTERVAL, 3L)
   }
 
   private def createManager(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (64a01c0 -> e6ef27b)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 64a01c0  [SPARK-32430][SQL] Extend SparkSessionExtensions to inject 
rules into AQE query stage preparation
 add e6ef27b  [SPARK-32287][TESTS] Flaky Test: 
ExecutorAllocationManagerSuite.add executors default profile

No new revisions were added by this update.

Summary of changes:
 .../test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32430][SQL] Extend SparkSessionExtensions to inject rules into AQE query stage preparation

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 7004c98  [SPARK-32430][SQL] Extend SparkSessionExtensions to inject 
rules into AQE query stage preparation
7004c98 is described below

commit 7004c989048b08891fb5f62ce2fcf0c89ce1496a
Author: Andy Grove 
AuthorDate: Fri Jul 24 11:03:57 2020 -0700

[SPARK-32430][SQL] Extend SparkSessionExtensions to inject rules into AQE 
query stage preparation

### What changes were proposed in this pull request?

Provide a generic mechanism for plugins to inject rules into the AQE "query 
prep" stage that happens before query stage creation.

This goes along with https://issues.apache.org/jira/browse/SPARK-32332 
where the current AQE implementation doesn't allow for users to properly extend 
it for columnar processing.

### Why are the changes needed?

The issue here is that we create new query stages but we do not have access 
to the parent plan of the new query stage so certain things can not be 
determined because you have to know what the parent did.  With this change it 
would allow you to add TAGs to be able to figure out what is going on.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

A new unit test is included in the PR.

Closes #29224 from andygrove/insert-aqe-rule.

Authored-by: Andy Grove 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 64a01c0a559396fccd615dc00576a80bc8cc5648)
Signed-off-by: Dongjoon Hyun 
---
 .../apache/spark/sql/SparkSessionExtensions.scala  | 20 -
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../sql/internal/BaseSessionStateBuilder.scala |  9 +++-
 .../apache/spark/sql/internal/SessionState.scala   |  4 +-
 .../spark/sql/SparkSessionExtensionSuite.scala | 49 ++
 5 files changed, 79 insertions(+), 5 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
index 1c2bf9e..bd870fb 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
@@ -27,7 +27,7 @@ import 
org.apache.spark.sql.catalyst.expressions.ExpressionInfo
 import org.apache.spark.sql.catalyst.parser.ParserInterface
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.execution.ColumnarRule
+import org.apache.spark.sql.execution.{ColumnarRule, SparkPlan}
 
 /**
  * :: Experimental ::
@@ -44,6 +44,7 @@ import org.apache.spark.sql.execution.ColumnarRule
  * Customized Parser.
  * (External) Catalog listeners.
  * Columnar Rules.
+ * Adaptive Query Stage Preparation Rules.
  * 
  *
  * The extensions can be used by calling `withExtensions` on the 
[[SparkSession.Builder]], for
@@ -96,8 +97,10 @@ class SparkSessionExtensions {
   type ParserBuilder = (SparkSession, ParserInterface) => ParserInterface
   type FunctionDescription = (FunctionIdentifier, ExpressionInfo, 
FunctionBuilder)
   type ColumnarRuleBuilder = SparkSession => ColumnarRule
+  type QueryStagePrepRuleBuilder = SparkSession => Rule[SparkPlan]
 
   private[this] val columnarRuleBuilders = 
mutable.Buffer.empty[ColumnarRuleBuilder]
+  private[this] val queryStagePrepRuleBuilders = 
mutable.Buffer.empty[QueryStagePrepRuleBuilder]
 
   /**
* Build the override rules for columnar execution.
@@ -107,12 +110,27 @@ class SparkSessionExtensions {
   }
 
   /**
+   * Build the override rules for the query stage preparation phase of 
adaptive query execution.
+   */
+  private[sql] def buildQueryStagePrepRules(session: SparkSession): 
Seq[Rule[SparkPlan]] = {
+queryStagePrepRuleBuilders.map(_.apply(session)).toSeq
+  }
+
+  /**
* Inject a rule that can override the columnar execution of an executor.
*/
   def injectColumnar(builder: ColumnarRuleBuilder): Unit = {
 columnarRuleBuilders += builder
   }
 
+  /**
+   * Inject a rule that can override the the query stage preparation phase of 
adaptive query
+   * execution.
+   */
+  def injectQueryStagePrepRule(builder: QueryStagePrepRuleBuilder): Unit = {
+queryStagePrepRuleBuilders += builder
+  }
+
   private[this] val resolutionRuleBuilders = mutable.Buffer.empty[RuleBuilder]
 
   /**
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index f6a..5714c33 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++

[spark] branch branch-3.0 updated: [SPARK-32430][SQL] Extend SparkSessionExtensions to inject rules into AQE query stage preparation

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 7004c98  [SPARK-32430][SQL] Extend SparkSessionExtensions to inject 
rules into AQE query stage preparation
7004c98 is described below

commit 7004c989048b08891fb5f62ce2fcf0c89ce1496a
Author: Andy Grove 
AuthorDate: Fri Jul 24 11:03:57 2020 -0700

[SPARK-32430][SQL] Extend SparkSessionExtensions to inject rules into AQE 
query stage preparation

### What changes were proposed in this pull request?

Provide a generic mechanism for plugins to inject rules into the AQE "query 
prep" stage that happens before query stage creation.

This goes along with https://issues.apache.org/jira/browse/SPARK-32332 
where the current AQE implementation doesn't allow for users to properly extend 
it for columnar processing.

### Why are the changes needed?

The issue here is that we create new query stages but we do not have access 
to the parent plan of the new query stage so certain things can not be 
determined because you have to know what the parent did.  With this change it 
would allow you to add TAGs to be able to figure out what is going on.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

A new unit test is included in the PR.

Closes #29224 from andygrove/insert-aqe-rule.

Authored-by: Andy Grove 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 64a01c0a559396fccd615dc00576a80bc8cc5648)
Signed-off-by: Dongjoon Hyun 
---
 .../apache/spark/sql/SparkSessionExtensions.scala  | 20 -
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../sql/internal/BaseSessionStateBuilder.scala |  9 +++-
 .../apache/spark/sql/internal/SessionState.scala   |  4 +-
 .../spark/sql/SparkSessionExtensionSuite.scala | 49 ++
 5 files changed, 79 insertions(+), 5 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
index 1c2bf9e..bd870fb 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
@@ -27,7 +27,7 @@ import 
org.apache.spark.sql.catalyst.expressions.ExpressionInfo
 import org.apache.spark.sql.catalyst.parser.ParserInterface
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.execution.ColumnarRule
+import org.apache.spark.sql.execution.{ColumnarRule, SparkPlan}
 
 /**
  * :: Experimental ::
@@ -44,6 +44,7 @@ import org.apache.spark.sql.execution.ColumnarRule
  * Customized Parser.
  * (External) Catalog listeners.
  * Columnar Rules.
+ * Adaptive Query Stage Preparation Rules.
  * 
  *
  * The extensions can be used by calling `withExtensions` on the 
[[SparkSession.Builder]], for
@@ -96,8 +97,10 @@ class SparkSessionExtensions {
   type ParserBuilder = (SparkSession, ParserInterface) => ParserInterface
   type FunctionDescription = (FunctionIdentifier, ExpressionInfo, 
FunctionBuilder)
   type ColumnarRuleBuilder = SparkSession => ColumnarRule
+  type QueryStagePrepRuleBuilder = SparkSession => Rule[SparkPlan]
 
   private[this] val columnarRuleBuilders = 
mutable.Buffer.empty[ColumnarRuleBuilder]
+  private[this] val queryStagePrepRuleBuilders = 
mutable.Buffer.empty[QueryStagePrepRuleBuilder]
 
   /**
* Build the override rules for columnar execution.
@@ -107,12 +110,27 @@ class SparkSessionExtensions {
   }
 
   /**
+   * Build the override rules for the query stage preparation phase of 
adaptive query execution.
+   */
+  private[sql] def buildQueryStagePrepRules(session: SparkSession): 
Seq[Rule[SparkPlan]] = {
+queryStagePrepRuleBuilders.map(_.apply(session)).toSeq
+  }
+
+  /**
* Inject a rule that can override the columnar execution of an executor.
*/
   def injectColumnar(builder: ColumnarRuleBuilder): Unit = {
 columnarRuleBuilders += builder
   }
 
+  /**
+   * Inject a rule that can override the the query stage preparation phase of 
adaptive query
+   * execution.
+   */
+  def injectQueryStagePrepRule(builder: QueryStagePrepRuleBuilder): Unit = {
+queryStagePrepRuleBuilders += builder
+  }
+
   private[this] val resolutionRuleBuilders = mutable.Buffer.empty[RuleBuilder]
 
   /**
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index f6a..5714c33 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++

[spark] branch master updated (d3596c0 -> 64a01c0)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d3596c0  [SPARK-32406][SQL] Make RESET syntax support single 
configuration reset
 add 64a01c0  [SPARK-32430][SQL] Extend SparkSessionExtensions to inject 
rules into AQE query stage preparation

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/SparkSessionExtensions.scala  | 20 -
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../sql/internal/BaseSessionStateBuilder.scala |  9 +++-
 .../apache/spark/sql/internal/SessionState.scala   |  4 +-
 .../spark/sql/SparkSessionExtensionSuite.scala | 49 ++
 5 files changed, 79 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32430][SQL] Extend SparkSessionExtensions to inject rules into AQE query stage preparation

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 7004c98  [SPARK-32430][SQL] Extend SparkSessionExtensions to inject 
rules into AQE query stage preparation
7004c98 is described below

commit 7004c989048b08891fb5f62ce2fcf0c89ce1496a
Author: Andy Grove 
AuthorDate: Fri Jul 24 11:03:57 2020 -0700

[SPARK-32430][SQL] Extend SparkSessionExtensions to inject rules into AQE 
query stage preparation

### What changes were proposed in this pull request?

Provide a generic mechanism for plugins to inject rules into the AQE "query 
prep" stage that happens before query stage creation.

This goes along with https://issues.apache.org/jira/browse/SPARK-32332 
where the current AQE implementation doesn't allow for users to properly extend 
it for columnar processing.

### Why are the changes needed?

The issue here is that we create new query stages but we do not have access 
to the parent plan of the new query stage so certain things can not be 
determined because you have to know what the parent did.  With this change it 
would allow you to add TAGs to be able to figure out what is going on.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

A new unit test is included in the PR.

Closes #29224 from andygrove/insert-aqe-rule.

Authored-by: Andy Grove 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 64a01c0a559396fccd615dc00576a80bc8cc5648)
Signed-off-by: Dongjoon Hyun 
---
 .../apache/spark/sql/SparkSessionExtensions.scala  | 20 -
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../sql/internal/BaseSessionStateBuilder.scala |  9 +++-
 .../apache/spark/sql/internal/SessionState.scala   |  4 +-
 .../spark/sql/SparkSessionExtensionSuite.scala | 49 ++
 5 files changed, 79 insertions(+), 5 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
index 1c2bf9e..bd870fb 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
@@ -27,7 +27,7 @@ import 
org.apache.spark.sql.catalyst.expressions.ExpressionInfo
 import org.apache.spark.sql.catalyst.parser.ParserInterface
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.execution.ColumnarRule
+import org.apache.spark.sql.execution.{ColumnarRule, SparkPlan}
 
 /**
  * :: Experimental ::
@@ -44,6 +44,7 @@ import org.apache.spark.sql.execution.ColumnarRule
  * Customized Parser.
  * (External) Catalog listeners.
  * Columnar Rules.
+ * Adaptive Query Stage Preparation Rules.
  * 
  *
  * The extensions can be used by calling `withExtensions` on the 
[[SparkSession.Builder]], for
@@ -96,8 +97,10 @@ class SparkSessionExtensions {
   type ParserBuilder = (SparkSession, ParserInterface) => ParserInterface
   type FunctionDescription = (FunctionIdentifier, ExpressionInfo, 
FunctionBuilder)
   type ColumnarRuleBuilder = SparkSession => ColumnarRule
+  type QueryStagePrepRuleBuilder = SparkSession => Rule[SparkPlan]
 
   private[this] val columnarRuleBuilders = 
mutable.Buffer.empty[ColumnarRuleBuilder]
+  private[this] val queryStagePrepRuleBuilders = 
mutable.Buffer.empty[QueryStagePrepRuleBuilder]
 
   /**
* Build the override rules for columnar execution.
@@ -107,12 +110,27 @@ class SparkSessionExtensions {
   }
 
   /**
+   * Build the override rules for the query stage preparation phase of 
adaptive query execution.
+   */
+  private[sql] def buildQueryStagePrepRules(session: SparkSession): 
Seq[Rule[SparkPlan]] = {
+queryStagePrepRuleBuilders.map(_.apply(session)).toSeq
+  }
+
+  /**
* Inject a rule that can override the columnar execution of an executor.
*/
   def injectColumnar(builder: ColumnarRuleBuilder): Unit = {
 columnarRuleBuilders += builder
   }
 
+  /**
+   * Inject a rule that can override the the query stage preparation phase of 
adaptive query
+   * execution.
+   */
+  def injectQueryStagePrepRule(builder: QueryStagePrepRuleBuilder): Unit = {
+queryStagePrepRuleBuilders += builder
+  }
+
   private[this] val resolutionRuleBuilders = mutable.Buffer.empty[RuleBuilder]
 
   /**
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index f6a..5714c33 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++

[spark] branch master updated (d3596c0 -> 64a01c0)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d3596c0  [SPARK-32406][SQL] Make RESET syntax support single 
configuration reset
 add 64a01c0  [SPARK-32430][SQL] Extend SparkSessionExtensions to inject 
rules into AQE query stage preparation

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/SparkSessionExtensions.scala  | 20 -
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../sql/internal/BaseSessionStateBuilder.scala |  9 +++-
 .../apache/spark/sql/internal/SessionState.scala   |  4 +-
 .../spark/sql/SparkSessionExtensionSuite.scala | 49 ++
 5 files changed, 79 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32430][SQL] Extend SparkSessionExtensions to inject rules into AQE query stage preparation

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 7004c98  [SPARK-32430][SQL] Extend SparkSessionExtensions to inject 
rules into AQE query stage preparation
7004c98 is described below

commit 7004c989048b08891fb5f62ce2fcf0c89ce1496a
Author: Andy Grove 
AuthorDate: Fri Jul 24 11:03:57 2020 -0700

[SPARK-32430][SQL] Extend SparkSessionExtensions to inject rules into AQE 
query stage preparation

### What changes were proposed in this pull request?

Provide a generic mechanism for plugins to inject rules into the AQE "query 
prep" stage that happens before query stage creation.

This goes along with https://issues.apache.org/jira/browse/SPARK-32332 
where the current AQE implementation doesn't allow for users to properly extend 
it for columnar processing.

### Why are the changes needed?

The issue here is that we create new query stages but we do not have access 
to the parent plan of the new query stage so certain things can not be 
determined because you have to know what the parent did.  With this change it 
would allow you to add TAGs to be able to figure out what is going on.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

A new unit test is included in the PR.

Closes #29224 from andygrove/insert-aqe-rule.

Authored-by: Andy Grove 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 64a01c0a559396fccd615dc00576a80bc8cc5648)
Signed-off-by: Dongjoon Hyun 
---
 .../apache/spark/sql/SparkSessionExtensions.scala  | 20 -
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../sql/internal/BaseSessionStateBuilder.scala |  9 +++-
 .../apache/spark/sql/internal/SessionState.scala   |  4 +-
 .../spark/sql/SparkSessionExtensionSuite.scala | 49 ++
 5 files changed, 79 insertions(+), 5 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
index 1c2bf9e..bd870fb 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
@@ -27,7 +27,7 @@ import 
org.apache.spark.sql.catalyst.expressions.ExpressionInfo
 import org.apache.spark.sql.catalyst.parser.ParserInterface
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.execution.ColumnarRule
+import org.apache.spark.sql.execution.{ColumnarRule, SparkPlan}
 
 /**
  * :: Experimental ::
@@ -44,6 +44,7 @@ import org.apache.spark.sql.execution.ColumnarRule
  * Customized Parser.
  * (External) Catalog listeners.
  * Columnar Rules.
+ * Adaptive Query Stage Preparation Rules.
  * 
  *
  * The extensions can be used by calling `withExtensions` on the 
[[SparkSession.Builder]], for
@@ -96,8 +97,10 @@ class SparkSessionExtensions {
   type ParserBuilder = (SparkSession, ParserInterface) => ParserInterface
   type FunctionDescription = (FunctionIdentifier, ExpressionInfo, 
FunctionBuilder)
   type ColumnarRuleBuilder = SparkSession => ColumnarRule
+  type QueryStagePrepRuleBuilder = SparkSession => Rule[SparkPlan]
 
   private[this] val columnarRuleBuilders = 
mutable.Buffer.empty[ColumnarRuleBuilder]
+  private[this] val queryStagePrepRuleBuilders = 
mutable.Buffer.empty[QueryStagePrepRuleBuilder]
 
   /**
* Build the override rules for columnar execution.
@@ -107,12 +110,27 @@ class SparkSessionExtensions {
   }
 
   /**
+   * Build the override rules for the query stage preparation phase of 
adaptive query execution.
+   */
+  private[sql] def buildQueryStagePrepRules(session: SparkSession): 
Seq[Rule[SparkPlan]] = {
+queryStagePrepRuleBuilders.map(_.apply(session)).toSeq
+  }
+
+  /**
* Inject a rule that can override the columnar execution of an executor.
*/
   def injectColumnar(builder: ColumnarRuleBuilder): Unit = {
 columnarRuleBuilders += builder
   }
 
+  /**
+   * Inject a rule that can override the the query stage preparation phase of 
adaptive query
+   * execution.
+   */
+  def injectQueryStagePrepRule(builder: QueryStagePrepRuleBuilder): Unit = {
+queryStagePrepRuleBuilders += builder
+  }
+
   private[this] val resolutionRuleBuilders = mutable.Buffer.empty[RuleBuilder]
 
   /**
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index f6a..5714c33 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++

[spark] branch master updated (d3596c0 -> 64a01c0)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d3596c0  [SPARK-32406][SQL] Make RESET syntax support single 
configuration reset
 add 64a01c0  [SPARK-32430][SQL] Extend SparkSessionExtensions to inject 
rules into AQE query stage preparation

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/SparkSessionExtensions.scala  | 20 -
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../sql/internal/BaseSessionStateBuilder.scala |  9 +++-
 .../apache/spark/sql/internal/SessionState.scala   |  4 +-
 .../spark/sql/SparkSessionExtensionSuite.scala | 49 ++
 5 files changed, 79 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32430][SQL] Extend SparkSessionExtensions to inject rules into AQE query stage preparation

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 7004c98  [SPARK-32430][SQL] Extend SparkSessionExtensions to inject 
rules into AQE query stage preparation
7004c98 is described below

commit 7004c989048b08891fb5f62ce2fcf0c89ce1496a
Author: Andy Grove 
AuthorDate: Fri Jul 24 11:03:57 2020 -0700

[SPARK-32430][SQL] Extend SparkSessionExtensions to inject rules into AQE 
query stage preparation

### What changes were proposed in this pull request?

Provide a generic mechanism for plugins to inject rules into the AQE "query 
prep" stage that happens before query stage creation.

This goes along with https://issues.apache.org/jira/browse/SPARK-32332 
where the current AQE implementation doesn't allow for users to properly extend 
it for columnar processing.

### Why are the changes needed?

The issue here is that we create new query stages but we do not have access 
to the parent plan of the new query stage so certain things can not be 
determined because you have to know what the parent did.  With this change it 
would allow you to add TAGs to be able to figure out what is going on.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

A new unit test is included in the PR.

Closes #29224 from andygrove/insert-aqe-rule.

Authored-by: Andy Grove 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 64a01c0a559396fccd615dc00576a80bc8cc5648)
Signed-off-by: Dongjoon Hyun 
---
 .../apache/spark/sql/SparkSessionExtensions.scala  | 20 -
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../sql/internal/BaseSessionStateBuilder.scala |  9 +++-
 .../apache/spark/sql/internal/SessionState.scala   |  4 +-
 .../spark/sql/SparkSessionExtensionSuite.scala | 49 ++
 5 files changed, 79 insertions(+), 5 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
index 1c2bf9e..bd870fb 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
@@ -27,7 +27,7 @@ import 
org.apache.spark.sql.catalyst.expressions.ExpressionInfo
 import org.apache.spark.sql.catalyst.parser.ParserInterface
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.execution.ColumnarRule
+import org.apache.spark.sql.execution.{ColumnarRule, SparkPlan}
 
 /**
  * :: Experimental ::
@@ -44,6 +44,7 @@ import org.apache.spark.sql.execution.ColumnarRule
  * Customized Parser.
  * (External) Catalog listeners.
  * Columnar Rules.
+ * Adaptive Query Stage Preparation Rules.
  * 
  *
  * The extensions can be used by calling `withExtensions` on the 
[[SparkSession.Builder]], for
@@ -96,8 +97,10 @@ class SparkSessionExtensions {
   type ParserBuilder = (SparkSession, ParserInterface) => ParserInterface
   type FunctionDescription = (FunctionIdentifier, ExpressionInfo, 
FunctionBuilder)
   type ColumnarRuleBuilder = SparkSession => ColumnarRule
+  type QueryStagePrepRuleBuilder = SparkSession => Rule[SparkPlan]
 
   private[this] val columnarRuleBuilders = 
mutable.Buffer.empty[ColumnarRuleBuilder]
+  private[this] val queryStagePrepRuleBuilders = 
mutable.Buffer.empty[QueryStagePrepRuleBuilder]
 
   /**
* Build the override rules for columnar execution.
@@ -107,12 +110,27 @@ class SparkSessionExtensions {
   }
 
   /**
+   * Build the override rules for the query stage preparation phase of 
adaptive query execution.
+   */
+  private[sql] def buildQueryStagePrepRules(session: SparkSession): 
Seq[Rule[SparkPlan]] = {
+queryStagePrepRuleBuilders.map(_.apply(session)).toSeq
+  }
+
+  /**
* Inject a rule that can override the columnar execution of an executor.
*/
   def injectColumnar(builder: ColumnarRuleBuilder): Unit = {
 columnarRuleBuilders += builder
   }
 
+  /**
+   * Inject a rule that can override the the query stage preparation phase of 
adaptive query
+   * execution.
+   */
+  def injectQueryStagePrepRule(builder: QueryStagePrepRuleBuilder): Unit = {
+queryStagePrepRuleBuilders += builder
+  }
+
   private[this] val resolutionRuleBuilders = mutable.Buffer.empty[RuleBuilder]
 
   /**
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index f6a..5714c33 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++

[spark] branch master updated (d3596c0 -> 64a01c0)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d3596c0  [SPARK-32406][SQL] Make RESET syntax support single 
configuration reset
 add 64a01c0  [SPARK-32430][SQL] Extend SparkSessionExtensions to inject 
rules into AQE query stage preparation

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/SparkSessionExtensions.scala  | 20 -
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../sql/internal/BaseSessionStateBuilder.scala |  9 +++-
 .../apache/spark/sql/internal/SessionState.scala   |  4 +-
 .../spark/sql/SparkSessionExtensionSuite.scala | 49 ++
 5 files changed, 79 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d3596c0 -> 64a01c0)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d3596c0  [SPARK-32406][SQL] Make RESET syntax support single 
configuration reset
 add 64a01c0  [SPARK-32430][SQL] Extend SparkSessionExtensions to inject 
rules into AQE query stage preparation

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/SparkSessionExtensions.scala  | 20 -
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../sql/internal/BaseSessionStateBuilder.scala |  9 +++-
 .../apache/spark/sql/internal/SessionState.scala   |  4 +-
 .../spark/sql/SparkSessionExtensionSuite.scala | 49 ++
 5 files changed, 79 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (f50432f -> 8a52bda)

2020-07-24 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f50432f  [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip 
package testing in Jenkins
 add 8a52bda  [SPARK-32310][ML][PYSPARK][3.0] ML params default value parity

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala |  10 --
 .../apache/spark/ml/classification/LinearSVC.scala |  11 +--
 .../ml/classification/LogisticRegression.scala |  13 +--
 .../spark/ml/classification/NaiveBayes.scala   |   4 +-
 .../spark/ml/clustering/BisectingKMeans.scala  |   7 +-
 .../spark/ml/clustering/GaussianMixture.scala  |   7 +-
 .../org/apache/spark/ml/clustering/KMeans.scala|  11 +--
 .../scala/org/apache/spark/ml/clustering/LDA.scala |  11 +--
 .../ml/clustering/PowerIterationClustering.scala   |   7 +-
 .../evaluation/BinaryClassificationEvaluator.scala |   4 +-
 .../MulticlassClassificationEvaluator.scala|   8 +-
 .../MultilabelClassificationEvaluator.scala|   6 +-
 .../spark/ml/evaluation/RankingEvaluator.scala |   6 +-
 .../spark/ml/evaluation/RegressionEvaluator.scala  |   4 +-
 .../apache/spark/ml/feature/ChiSqSelector.scala|   9 +-
 .../org/apache/spark/ml/feature/Imputer.scala  |   4 +-
 .../org/apache/spark/ml/feature/MinMaxScaler.scala |   4 +-
 .../apache/spark/ml/feature/OneHotEncoder.scala|   5 +-
 .../spark/ml/feature/QuantileDiscretizer.scala |   4 +-
 .../org/apache/spark/ml/feature/RFormula.scala |   6 +-
 .../org/apache/spark/ml/feature/RobustScaler.scala |   8 +-
 .../apache/spark/ml/feature/StringIndexer.scala|   6 +-
 .../apache/spark/ml/feature/VectorIndexer.scala|   6 +-
 .../org/apache/spark/ml/feature/VectorSlicer.scala |   6 +-
 .../org/apache/spark/ml/feature/Word2Vec.scala |   9 +-
 .../scala/org/apache/spark/ml/fpm/FPGrowth.scala   |   5 +-
 .../ml/regression/AFTSurvivalRegression.scala  |  10 +-
 .../spark/ml/regression/LinearRegression.scala |  14 +--
 .../org/apache/spark/ml/tree/treeParams.scala  |  16 +--
 .../spark/ml/util/DefaultReadWriteTest.scala   |   3 +
 python/pyspark/ml/classification.py|  86 +++-
 python/pyspark/ml/clustering.py|  43 ++--
 python/pyspark/ml/feature.py   | 110 ++---
 python/pyspark/ml/fpm.py   |  12 ++-
 python/pyspark/ml/recommendation.py|  20 ++--
 python/pyspark/ml/regression.py|  88 -
 python/pyspark/ml/tests/test_param.py  |   7 +-
 python/pyspark/ml/tuning.py|  16 ++-
 38 files changed, 368 insertions(+), 238 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (f50432f -> 8a52bda)

2020-07-24 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f50432f  [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip 
package testing in Jenkins
 add 8a52bda  [SPARK-32310][ML][PYSPARK][3.0] ML params default value parity

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala |  10 --
 .../apache/spark/ml/classification/LinearSVC.scala |  11 +--
 .../ml/classification/LogisticRegression.scala |  13 +--
 .../spark/ml/classification/NaiveBayes.scala   |   4 +-
 .../spark/ml/clustering/BisectingKMeans.scala  |   7 +-
 .../spark/ml/clustering/GaussianMixture.scala  |   7 +-
 .../org/apache/spark/ml/clustering/KMeans.scala|  11 +--
 .../scala/org/apache/spark/ml/clustering/LDA.scala |  11 +--
 .../ml/clustering/PowerIterationClustering.scala   |   7 +-
 .../evaluation/BinaryClassificationEvaluator.scala |   4 +-
 .../MulticlassClassificationEvaluator.scala|   8 +-
 .../MultilabelClassificationEvaluator.scala|   6 +-
 .../spark/ml/evaluation/RankingEvaluator.scala |   6 +-
 .../spark/ml/evaluation/RegressionEvaluator.scala  |   4 +-
 .../apache/spark/ml/feature/ChiSqSelector.scala|   9 +-
 .../org/apache/spark/ml/feature/Imputer.scala  |   4 +-
 .../org/apache/spark/ml/feature/MinMaxScaler.scala |   4 +-
 .../apache/spark/ml/feature/OneHotEncoder.scala|   5 +-
 .../spark/ml/feature/QuantileDiscretizer.scala |   4 +-
 .../org/apache/spark/ml/feature/RFormula.scala |   6 +-
 .../org/apache/spark/ml/feature/RobustScaler.scala |   8 +-
 .../apache/spark/ml/feature/StringIndexer.scala|   6 +-
 .../apache/spark/ml/feature/VectorIndexer.scala|   6 +-
 .../org/apache/spark/ml/feature/VectorSlicer.scala |   6 +-
 .../org/apache/spark/ml/feature/Word2Vec.scala |   9 +-
 .../scala/org/apache/spark/ml/fpm/FPGrowth.scala   |   5 +-
 .../ml/regression/AFTSurvivalRegression.scala  |  10 +-
 .../spark/ml/regression/LinearRegression.scala |  14 +--
 .../org/apache/spark/ml/tree/treeParams.scala  |  16 +--
 .../spark/ml/util/DefaultReadWriteTest.scala   |   3 +
 python/pyspark/ml/classification.py|  86 +++-
 python/pyspark/ml/clustering.py|  43 ++--
 python/pyspark/ml/feature.py   | 110 ++---
 python/pyspark/ml/fpm.py   |  12 ++-
 python/pyspark/ml/recommendation.py|  20 ++--
 python/pyspark/ml/regression.py|  88 -
 python/pyspark/ml/tests/test_param.py  |   7 +-
 python/pyspark/ml/tuning.py|  16 ++-
 38 files changed, 368 insertions(+), 238 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (f50432f -> 8a52bda)

2020-07-24 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f50432f  [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip 
package testing in Jenkins
 add 8a52bda  [SPARK-32310][ML][PYSPARK][3.0] ML params default value parity

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala |  10 --
 .../apache/spark/ml/classification/LinearSVC.scala |  11 +--
 .../ml/classification/LogisticRegression.scala |  13 +--
 .../spark/ml/classification/NaiveBayes.scala   |   4 +-
 .../spark/ml/clustering/BisectingKMeans.scala  |   7 +-
 .../spark/ml/clustering/GaussianMixture.scala  |   7 +-
 .../org/apache/spark/ml/clustering/KMeans.scala|  11 +--
 .../scala/org/apache/spark/ml/clustering/LDA.scala |  11 +--
 .../ml/clustering/PowerIterationClustering.scala   |   7 +-
 .../evaluation/BinaryClassificationEvaluator.scala |   4 +-
 .../MulticlassClassificationEvaluator.scala|   8 +-
 .../MultilabelClassificationEvaluator.scala|   6 +-
 .../spark/ml/evaluation/RankingEvaluator.scala |   6 +-
 .../spark/ml/evaluation/RegressionEvaluator.scala  |   4 +-
 .../apache/spark/ml/feature/ChiSqSelector.scala|   9 +-
 .../org/apache/spark/ml/feature/Imputer.scala  |   4 +-
 .../org/apache/spark/ml/feature/MinMaxScaler.scala |   4 +-
 .../apache/spark/ml/feature/OneHotEncoder.scala|   5 +-
 .../spark/ml/feature/QuantileDiscretizer.scala |   4 +-
 .../org/apache/spark/ml/feature/RFormula.scala |   6 +-
 .../org/apache/spark/ml/feature/RobustScaler.scala |   8 +-
 .../apache/spark/ml/feature/StringIndexer.scala|   6 +-
 .../apache/spark/ml/feature/VectorIndexer.scala|   6 +-
 .../org/apache/spark/ml/feature/VectorSlicer.scala |   6 +-
 .../org/apache/spark/ml/feature/Word2Vec.scala |   9 +-
 .../scala/org/apache/spark/ml/fpm/FPGrowth.scala   |   5 +-
 .../ml/regression/AFTSurvivalRegression.scala  |  10 +-
 .../spark/ml/regression/LinearRegression.scala |  14 +--
 .../org/apache/spark/ml/tree/treeParams.scala  |  16 +--
 .../spark/ml/util/DefaultReadWriteTest.scala   |   3 +
 python/pyspark/ml/classification.py|  86 +++-
 python/pyspark/ml/clustering.py|  43 ++--
 python/pyspark/ml/feature.py   | 110 ++---
 python/pyspark/ml/fpm.py   |  12 ++-
 python/pyspark/ml/recommendation.py|  20 ++--
 python/pyspark/ml/regression.py|  88 -
 python/pyspark/ml/tests/test_param.py  |   7 +-
 python/pyspark/ml/tuning.py|  16 ++-
 38 files changed, 368 insertions(+), 238 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (f50432f -> 8a52bda)

2020-07-24 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f50432f  [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip 
package testing in Jenkins
 add 8a52bda  [SPARK-32310][ML][PYSPARK][3.0] ML params default value parity

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala |  10 --
 .../apache/spark/ml/classification/LinearSVC.scala |  11 +--
 .../ml/classification/LogisticRegression.scala |  13 +--
 .../spark/ml/classification/NaiveBayes.scala   |   4 +-
 .../spark/ml/clustering/BisectingKMeans.scala  |   7 +-
 .../spark/ml/clustering/GaussianMixture.scala  |   7 +-
 .../org/apache/spark/ml/clustering/KMeans.scala|  11 +--
 .../scala/org/apache/spark/ml/clustering/LDA.scala |  11 +--
 .../ml/clustering/PowerIterationClustering.scala   |   7 +-
 .../evaluation/BinaryClassificationEvaluator.scala |   4 +-
 .../MulticlassClassificationEvaluator.scala|   8 +-
 .../MultilabelClassificationEvaluator.scala|   6 +-
 .../spark/ml/evaluation/RankingEvaluator.scala |   6 +-
 .../spark/ml/evaluation/RegressionEvaluator.scala  |   4 +-
 .../apache/spark/ml/feature/ChiSqSelector.scala|   9 +-
 .../org/apache/spark/ml/feature/Imputer.scala  |   4 +-
 .../org/apache/spark/ml/feature/MinMaxScaler.scala |   4 +-
 .../apache/spark/ml/feature/OneHotEncoder.scala|   5 +-
 .../spark/ml/feature/QuantileDiscretizer.scala |   4 +-
 .../org/apache/spark/ml/feature/RFormula.scala |   6 +-
 .../org/apache/spark/ml/feature/RobustScaler.scala |   8 +-
 .../apache/spark/ml/feature/StringIndexer.scala|   6 +-
 .../apache/spark/ml/feature/VectorIndexer.scala|   6 +-
 .../org/apache/spark/ml/feature/VectorSlicer.scala |   6 +-
 .../org/apache/spark/ml/feature/Word2Vec.scala |   9 +-
 .../scala/org/apache/spark/ml/fpm/FPGrowth.scala   |   5 +-
 .../ml/regression/AFTSurvivalRegression.scala  |  10 +-
 .../spark/ml/regression/LinearRegression.scala |  14 +--
 .../org/apache/spark/ml/tree/treeParams.scala  |  16 +--
 .../spark/ml/util/DefaultReadWriteTest.scala   |   3 +
 python/pyspark/ml/classification.py|  86 +++-
 python/pyspark/ml/clustering.py|  43 ++--
 python/pyspark/ml/feature.py   | 110 ++---
 python/pyspark/ml/fpm.py   |  12 ++-
 python/pyspark/ml/recommendation.py|  20 ++--
 python/pyspark/ml/regression.py|  88 -
 python/pyspark/ml/tests/test_param.py  |   7 +-
 python/pyspark/ml/tuning.py|  16 ++-
 38 files changed, 368 insertions(+), 238 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (f50432f -> 8a52bda)

2020-07-24 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f50432f  [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip 
package testing in Jenkins
 add 8a52bda  [SPARK-32310][ML][PYSPARK][3.0] ML params default value parity

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala |  10 --
 .../apache/spark/ml/classification/LinearSVC.scala |  11 +--
 .../ml/classification/LogisticRegression.scala |  13 +--
 .../spark/ml/classification/NaiveBayes.scala   |   4 +-
 .../spark/ml/clustering/BisectingKMeans.scala  |   7 +-
 .../spark/ml/clustering/GaussianMixture.scala  |   7 +-
 .../org/apache/spark/ml/clustering/KMeans.scala|  11 +--
 .../scala/org/apache/spark/ml/clustering/LDA.scala |  11 +--
 .../ml/clustering/PowerIterationClustering.scala   |   7 +-
 .../evaluation/BinaryClassificationEvaluator.scala |   4 +-
 .../MulticlassClassificationEvaluator.scala|   8 +-
 .../MultilabelClassificationEvaluator.scala|   6 +-
 .../spark/ml/evaluation/RankingEvaluator.scala |   6 +-
 .../spark/ml/evaluation/RegressionEvaluator.scala  |   4 +-
 .../apache/spark/ml/feature/ChiSqSelector.scala|   9 +-
 .../org/apache/spark/ml/feature/Imputer.scala  |   4 +-
 .../org/apache/spark/ml/feature/MinMaxScaler.scala |   4 +-
 .../apache/spark/ml/feature/OneHotEncoder.scala|   5 +-
 .../spark/ml/feature/QuantileDiscretizer.scala |   4 +-
 .../org/apache/spark/ml/feature/RFormula.scala |   6 +-
 .../org/apache/spark/ml/feature/RobustScaler.scala |   8 +-
 .../apache/spark/ml/feature/StringIndexer.scala|   6 +-
 .../apache/spark/ml/feature/VectorIndexer.scala|   6 +-
 .../org/apache/spark/ml/feature/VectorSlicer.scala |   6 +-
 .../org/apache/spark/ml/feature/Word2Vec.scala |   9 +-
 .../scala/org/apache/spark/ml/fpm/FPGrowth.scala   |   5 +-
 .../ml/regression/AFTSurvivalRegression.scala  |  10 +-
 .../spark/ml/regression/LinearRegression.scala |  14 +--
 .../org/apache/spark/ml/tree/treeParams.scala  |  16 +--
 .../spark/ml/util/DefaultReadWriteTest.scala   |   3 +
 python/pyspark/ml/classification.py|  86 +++-
 python/pyspark/ml/clustering.py|  43 ++--
 python/pyspark/ml/feature.py   | 110 ++---
 python/pyspark/ml/fpm.py   |  12 ++-
 python/pyspark/ml/recommendation.py|  20 ++--
 python/pyspark/ml/regression.py|  88 -
 python/pyspark/ml/tests/test_param.py  |   7 +-
 python/pyspark/ml/tuning.py|  16 ++-
 38 files changed, 368 insertions(+), 238 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fa184c3 -> d3596c0)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fa184c3  [SPARK-32408][BUILD] Enable crossPaths back to prevent side 
effects
 add d3596c0  [SPARK-32406][SQL] Make RESET syntax support single 
configuration reset

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-conf-mgmt-reset.md| 19 +--
 .../org/apache/spark/sql/catalyst/parser/SqlBase.g4   |  2 +-
 .../apache/spark/sql/execution/SparkSqlParser.scala   |  3 ++-
 .../spark/sql/execution/command/SetCommand.scala  | 14 ++
 .../org/apache/spark/sql/internal/SQLConfSuite.scala  | 15 +++
 5 files changed, 45 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fa184c3 -> d3596c0)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fa184c3  [SPARK-32408][BUILD] Enable crossPaths back to prevent side 
effects
 add d3596c0  [SPARK-32406][SQL] Make RESET syntax support single 
configuration reset

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-conf-mgmt-reset.md| 19 +--
 .../org/apache/spark/sql/catalyst/parser/SqlBase.g4   |  2 +-
 .../apache/spark/sql/execution/SparkSqlParser.scala   |  3 ++-
 .../spark/sql/execution/command/SetCommand.scala  | 14 ++
 .../org/apache/spark/sql/internal/SQLConfSuite.scala  | 15 +++
 5 files changed, 45 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fa184c3 -> d3596c0)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fa184c3  [SPARK-32408][BUILD] Enable crossPaths back to prevent side 
effects
 add d3596c0  [SPARK-32406][SQL] Make RESET syntax support single 
configuration reset

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-conf-mgmt-reset.md| 19 +--
 .../org/apache/spark/sql/catalyst/parser/SqlBase.g4   |  2 +-
 .../apache/spark/sql/execution/SparkSqlParser.scala   |  3 ++-
 .../spark/sql/execution/command/SetCommand.scala  | 14 ++
 .../org/apache/spark/sql/internal/SQLConfSuite.scala  | 15 +++
 5 files changed, 45 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fa184c3 -> d3596c0)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fa184c3  [SPARK-32408][BUILD] Enable crossPaths back to prevent side 
effects
 add d3596c0  [SPARK-32406][SQL] Make RESET syntax support single 
configuration reset

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-conf-mgmt-reset.md| 19 +--
 .../org/apache/spark/sql/catalyst/parser/SqlBase.g4   |  2 +-
 .../apache/spark/sql/execution/SparkSqlParser.scala   |  3 ++-
 .../spark/sql/execution/command/SetCommand.scala  | 14 ++
 .../org/apache/spark/sql/internal/SQLConfSuite.scala  | 15 +++
 5 files changed, 45 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fa184c3 -> d3596c0)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fa184c3  [SPARK-32408][BUILD] Enable crossPaths back to prevent side 
effects
 add d3596c0  [SPARK-32406][SQL] Make RESET syntax support single 
configuration reset

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-conf-mgmt-reset.md| 19 +--
 .../org/apache/spark/sql/catalyst/parser/SqlBase.g4   |  2 +-
 .../apache/spark/sql/execution/SparkSqlParser.scala   |  3 ++-
 .../spark/sql/execution/command/SetCommand.scala  | 14 ++
 .../org/apache/spark/sql/internal/SQLConfSuite.scala  | 15 +++
 5 files changed, 45 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8bc799f9 -> fa184c3)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8bc799f9 [SPARK-32375][SQL] Basic functionality of table catalog v2 
for JDBC
 add fa184c3  [SPARK-32408][BUILD] Enable crossPaths back to prevent side 
effects

No new revisions were added by this update.

Summary of changes:
 project/SparkBuild.scala | 2 --
 1 file changed, 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8bc799f9 -> fa184c3)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8bc799f9 [SPARK-32375][SQL] Basic functionality of table catalog v2 
for JDBC
 add fa184c3  [SPARK-32408][BUILD] Enable crossPaths back to prevent side 
effects

No new revisions were added by this update.

Summary of changes:
 project/SparkBuild.scala | 2 --
 1 file changed, 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8bc799f9 -> fa184c3)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8bc799f9 [SPARK-32375][SQL] Basic functionality of table catalog v2 
for JDBC
 add fa184c3  [SPARK-32408][BUILD] Enable crossPaths back to prevent side 
effects

No new revisions were added by this update.

Summary of changes:
 project/SparkBuild.scala | 2 --
 1 file changed, 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8bc799f9 -> fa184c3)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8bc799f9 [SPARK-32375][SQL] Basic functionality of table catalog v2 
for JDBC
 add fa184c3  [SPARK-32408][BUILD] Enable crossPaths back to prevent side 
effects

No new revisions were added by this update.

Summary of changes:
 project/SparkBuild.scala | 2 --
 1 file changed, 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8bc799f9 -> fa184c3)

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8bc799f9 [SPARK-32375][SQL] Basic functionality of table catalog v2 
for JDBC
 add fa184c3  [SPARK-32408][BUILD] Enable crossPaths back to prevent side 
effects

No new revisions were added by this update.

Summary of changes:
 project/SparkBuild.scala | 2 --
 1 file changed, 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip package testing in Jenkins

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f50432f  [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip 
package testing in Jenkins
f50432f is described below

commit f50432fd3d835f8b5de3283edce4c146cfafe827
Author: HyukjinKwon 
AuthorDate: Fri Jul 24 07:18:15 2020 -0700

[SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip package testing in 
Jenkins

### What changes were proposed in this pull request?

This PR backports https://github.com/apache/spark/pull/29117 to branch-3.0 
as the flakiness was found in branch-3.0 too: 
https://github.com/apache/spark/pull/29201#issuecomment-663114741 and 
https://github.com/apache/spark/pull/29201#issuecomment-663114741

This PR proposes:

- ~~Don't use `--user` in pip packaging test~~
- ~~Pull `source` out of the subshell, and place it first.~~
- Exclude user sitepackages in Python path during pip installation test

to address the flakiness of the pip packaging test in Jenkins.

~~(I think) #29116 caused this flakiness given my observation in the 
Jenkins log. I had to work around by specifying `--user` but it turned out that 
it does not properly work in old Conda on Jenkins for some reasons. Therefore, 
reverting this change back.~~

(I think) the installation at user site-packages affects other environments 
created by Conda in the old Conda version that Jenkins has. Seems it fails to 
isolate the environments for some reasons. So, it excludes user sitepackages in 
the Python path during the test.

~~In addition, #29116 also added some fallback logics of `conda 
(de)activate` and `source (de)activate` because Conda prefers to use `conda 
(de)activate` now per the official documentation and `source (de)activate` 
doesn't work for some reasons in certain environments (see also 
https://github.com/conda/conda/issues/7980). The problem was that `source` 
loads things to the current shell so does not affect the current shell. 
Therefore, this PR pulls `source` out of the subshell.~~

Disclaimer: I made the analysis purely based on Jenkins machine's log in 
this PR. It may have a different reason I missed during my observation.

### Why are the changes needed?

To make the build and tests pass in Jenkins.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Jenkins tests should test it out.

Closes #29215 from HyukjinKwon/SPARK-32363-3.0.

Authored-by: HyukjinKwon 
Signed-off-by: Dongjoon Hyun 
---
 dev/run-pip-tests | 4 
 1 file changed, 4 insertions(+)

diff --git a/dev/run-pip-tests b/dev/run-pip-tests
index 470f21e..81e33a6 100755
--- a/dev/run-pip-tests
+++ b/dev/run-pip-tests
@@ -68,6 +68,10 @@ PIP_OPTIONS="--upgrade --no-cache-dir --force-reinstall "
 PIP_COMMANDS=("pip install $PIP_OPTIONS $PYSPARK_DIST"
  "pip install $PIP_OPTIONS -e python/")
 
+# Jenkins has PySpark installed under user sitepackages shared for some 
reasons.
+# In this test, explicitly exclude user sitepackages to prevent side effects
+export PYTHONNOUSERSITE=1
+
 for python in "${PYTHON_EXECS[@]}"; do
   for install_command in "${PIP_COMMANDS[@]}"; do
 echo "Testing pip installation with python $python"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip package testing in Jenkins

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f50432f  [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip 
package testing in Jenkins
f50432f is described below

commit f50432fd3d835f8b5de3283edce4c146cfafe827
Author: HyukjinKwon 
AuthorDate: Fri Jul 24 07:18:15 2020 -0700

[SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip package testing in 
Jenkins

### What changes were proposed in this pull request?

This PR backports https://github.com/apache/spark/pull/29117 to branch-3.0 
as the flakiness was found in branch-3.0 too: 
https://github.com/apache/spark/pull/29201#issuecomment-663114741 and 
https://github.com/apache/spark/pull/29201#issuecomment-663114741

This PR proposes:

- ~~Don't use `--user` in pip packaging test~~
- ~~Pull `source` out of the subshell, and place it first.~~
- Exclude user sitepackages in Python path during pip installation test

to address the flakiness of the pip packaging test in Jenkins.

~~(I think) #29116 caused this flakiness given my observation in the 
Jenkins log. I had to work around by specifying `--user` but it turned out that 
it does not properly work in old Conda on Jenkins for some reasons. Therefore, 
reverting this change back.~~

(I think) the installation at user site-packages affects other environments 
created by Conda in the old Conda version that Jenkins has. Seems it fails to 
isolate the environments for some reasons. So, it excludes user sitepackages in 
the Python path during the test.

~~In addition, #29116 also added some fallback logics of `conda 
(de)activate` and `source (de)activate` because Conda prefers to use `conda 
(de)activate` now per the official documentation and `source (de)activate` 
doesn't work for some reasons in certain environments (see also 
https://github.com/conda/conda/issues/7980). The problem was that `source` 
loads things to the current shell so does not affect the current shell. 
Therefore, this PR pulls `source` out of the subshell.~~

Disclaimer: I made the analysis purely based on Jenkins machine's log in 
this PR. It may have a different reason I missed during my observation.

### Why are the changes needed?

To make the build and tests pass in Jenkins.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Jenkins tests should test it out.

Closes #29215 from HyukjinKwon/SPARK-32363-3.0.

Authored-by: HyukjinKwon 
Signed-off-by: Dongjoon Hyun 
---
 dev/run-pip-tests | 4 
 1 file changed, 4 insertions(+)

diff --git a/dev/run-pip-tests b/dev/run-pip-tests
index 470f21e..81e33a6 100755
--- a/dev/run-pip-tests
+++ b/dev/run-pip-tests
@@ -68,6 +68,10 @@ PIP_OPTIONS="--upgrade --no-cache-dir --force-reinstall "
 PIP_COMMANDS=("pip install $PIP_OPTIONS $PYSPARK_DIST"
  "pip install $PIP_OPTIONS -e python/")
 
+# Jenkins has PySpark installed under user sitepackages shared for some 
reasons.
+# In this test, explicitly exclude user sitepackages to prevent side effects
+export PYTHONNOUSERSITE=1
+
 for python in "${PYTHON_EXECS[@]}"; do
   for install_command in "${PIP_COMMANDS[@]}"; do
 echo "Testing pip installation with python $python"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip package testing in Jenkins

2020-07-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f50432f  [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip 
package testing in Jenkins
f50432f is described below

commit f50432fd3d835f8b5de3283edce4c146cfafe827
Author: HyukjinKwon 
AuthorDate: Fri Jul 24 07:18:15 2020 -0700

[SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip package testing in 
Jenkins

### What changes were proposed in this pull request?

This PR backports https://github.com/apache/spark/pull/29117 to branch-3.0 
as the flakiness was found in branch-3.0 too: 
https://github.com/apache/spark/pull/29201#issuecomment-663114741 and 
https://github.com/apache/spark/pull/29201#issuecomment-663114741

This PR proposes:

- ~~Don't use `--user` in pip packaging test~~
- ~~Pull `source` out of the subshell, and place it first.~~
- Exclude user sitepackages in Python path during pip installation test

to address the flakiness of the pip packaging test in Jenkins.

~~(I think) #29116 caused this flakiness given my observation in the 
Jenkins log. I had to work around by specifying `--user` but it turned out that 
it does not properly work in old Conda on Jenkins for some reasons. Therefore, 
reverting this change back.~~

(I think) the installation at user site-packages affects other environments 
created by Conda in the old Conda version that Jenkins has. Seems it fails to 
isolate the environments for some reasons. So, it excludes user sitepackages in 
the Python path during the test.

~~In addition, #29116 also added some fallback logics of `conda 
(de)activate` and `source (de)activate` because Conda prefers to use `conda 
(de)activate` now per the official documentation and `source (de)activate` 
doesn't work for some reasons in certain environments (see also 
https://github.com/conda/conda/issues/7980). The problem was that `source` 
loads things to the current shell so does not affect the current shell. 
Therefore, this PR pulls `source` out of the subshell.~~

Disclaimer: I made the analysis purely based on Jenkins machine's log in 
this PR. It may have a different reason I missed during my observation.

### Why are the changes needed?

To make the build and tests pass in Jenkins.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Jenkins tests should test it out.

Closes #29215 from HyukjinKwon/SPARK-32363-3.0.

Authored-by: HyukjinKwon 
Signed-off-by: Dongjoon Hyun 
---
 dev/run-pip-tests | 4 
 1 file changed, 4 insertions(+)

diff --git a/dev/run-pip-tests b/dev/run-pip-tests
index 470f21e..81e33a6 100755
--- a/dev/run-pip-tests
+++ b/dev/run-pip-tests
@@ -68,6 +68,10 @@ PIP_OPTIONS="--upgrade --no-cache-dir --force-reinstall "
 PIP_COMMANDS=("pip install $PIP_OPTIONS $PYSPARK_DIST"
  "pip install $PIP_OPTIONS -e python/")
 
+# Jenkins has PySpark installed under user sitepackages shared for some 
reasons.
+# In this test, explicitly exclude user sitepackages to prevent side effects
+export PYTHONNOUSERSITE=1
+
 for python in "${PYTHON_EXECS[@]}"; do
   for install_command in "${PIP_COMMANDS[@]}"; do
 echo "Testing pip installation with python $python"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8896f4a -> 8bc799f9)

2020-07-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8896f4a  Revert "[SPARK-32253][INFRA] Show errors only for the sbt 
tests of github actions"
 add 8bc799f9 [SPARK-32375][SQL] Basic functionality of table catalog v2 
for JDBC

No new revisions were added by this update.

Summary of changes:
 .../datasources/jdbc/JdbcRelationProvider.scala|   4 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |  50 +--
 .../jdbc/JDBCTable.scala}  |  28 ++--
 .../datasources/v2/jdbc/JDBCTableCatalog.scala | 158 +
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   |  13 ++
 .../v2/jdbc/JDBCTableCatalogSuite.scala| 109 ++
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  |   5 +-
 .../org/apache/spark/sql/jdbc/JDBCWriteSuite.scala |   6 +-
 8 files changed, 343 insertions(+), 30 deletions(-)
 copy 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/{jdbc/connection/BasicConnectionProvider.scala
 => v2/jdbc/JDBCTable.scala} (53%)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8896f4a -> 8bc799f9)

2020-07-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8896f4a  Revert "[SPARK-32253][INFRA] Show errors only for the sbt 
tests of github actions"
 add 8bc799f9 [SPARK-32375][SQL] Basic functionality of table catalog v2 
for JDBC

No new revisions were added by this update.

Summary of changes:
 .../datasources/jdbc/JdbcRelationProvider.scala|   4 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |  50 +--
 .../jdbc/JDBCTable.scala}  |  28 ++--
 .../datasources/v2/jdbc/JDBCTableCatalog.scala | 158 +
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   |  13 ++
 .../v2/jdbc/JDBCTableCatalogSuite.scala| 109 ++
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  |   5 +-
 .../org/apache/spark/sql/jdbc/JDBCWriteSuite.scala |   6 +-
 8 files changed, 343 insertions(+), 30 deletions(-)
 copy 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/{jdbc/connection/BasicConnectionProvider.scala
 => v2/jdbc/JDBCTable.scala} (53%)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8896f4a -> 8bc799f9)

2020-07-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8896f4a  Revert "[SPARK-32253][INFRA] Show errors only for the sbt 
tests of github actions"
 add 8bc799f9 [SPARK-32375][SQL] Basic functionality of table catalog v2 
for JDBC

No new revisions were added by this update.

Summary of changes:
 .../datasources/jdbc/JdbcRelationProvider.scala|   4 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |  50 +--
 .../jdbc/JDBCTable.scala}  |  28 ++--
 .../datasources/v2/jdbc/JDBCTableCatalog.scala | 158 +
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   |  13 ++
 .../v2/jdbc/JDBCTableCatalogSuite.scala| 109 ++
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  |   5 +-
 .../org/apache/spark/sql/jdbc/JDBCWriteSuite.scala |   6 +-
 8 files changed, 343 insertions(+), 30 deletions(-)
 copy 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/{jdbc/connection/BasicConnectionProvider.scala
 => v2/jdbc/JDBCTable.scala} (53%)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8896f4a -> 8bc799f9)

2020-07-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8896f4a  Revert "[SPARK-32253][INFRA] Show errors only for the sbt 
tests of github actions"
 add 8bc799f9 [SPARK-32375][SQL] Basic functionality of table catalog v2 
for JDBC

No new revisions were added by this update.

Summary of changes:
 .../datasources/jdbc/JdbcRelationProvider.scala|   4 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |  50 +--
 .../jdbc/JDBCTable.scala}  |  28 ++--
 .../datasources/v2/jdbc/JDBCTableCatalog.scala | 158 +
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   |  13 ++
 .../v2/jdbc/JDBCTableCatalogSuite.scala| 109 ++
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  |   5 +-
 .../org/apache/spark/sql/jdbc/JDBCWriteSuite.scala |   6 +-
 8 files changed, 343 insertions(+), 30 deletions(-)
 copy 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/{jdbc/connection/BasicConnectionProvider.scala
 => v2/jdbc/JDBCTable.scala} (53%)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8896f4a -> 8bc799f9)

2020-07-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8896f4a  Revert "[SPARK-32253][INFRA] Show errors only for the sbt 
tests of github actions"
 add 8bc799f9 [SPARK-32375][SQL] Basic functionality of table catalog v2 
for JDBC

No new revisions were added by this update.

Summary of changes:
 .../datasources/jdbc/JdbcRelationProvider.scala|   4 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |  50 +--
 .../jdbc/JDBCTable.scala}  |  28 ++--
 .../datasources/v2/jdbc/JDBCTableCatalog.scala | 158 +
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   |  13 ++
 .../v2/jdbc/JDBCTableCatalogSuite.scala| 109 ++
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  |   5 +-
 .../org/apache/spark/sql/jdbc/JDBCWriteSuite.scala |   6 +-
 8 files changed, 343 insertions(+), 30 deletions(-)
 copy 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/{jdbc/connection/BasicConnectionProvider.scala
 => v2/jdbc/JDBCTable.scala} (53%)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (84efa04 -> 8896f4a)

2020-07-24 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 84efa04  [SPARK-32308][SQL] Move by-name resolution logic of 
unionByName from API code to analysis phase
 add 8896f4a  Revert "[SPARK-32253][INFRA] Show errors only for the sbt 
tests of github actions"

No new revisions were added by this update.

Summary of changes:
 dev/run-tests.py | 3 ---
 project/SparkBuild.scala | 7 ---
 2 files changed, 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (84efa04 -> 8896f4a)

2020-07-24 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 84efa04  [SPARK-32308][SQL] Move by-name resolution logic of 
unionByName from API code to analysis phase
 add 8896f4a  Revert "[SPARK-32253][INFRA] Show errors only for the sbt 
tests of github actions"

No new revisions were added by this update.

Summary of changes:
 dev/run-tests.py | 3 ---
 project/SparkBuild.scala | 7 ---
 2 files changed, 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (84efa04 -> 8896f4a)

2020-07-24 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 84efa04  [SPARK-32308][SQL] Move by-name resolution logic of 
unionByName from API code to analysis phase
 add 8896f4a  Revert "[SPARK-32253][INFRA] Show errors only for the sbt 
tests of github actions"

No new revisions were added by this update.

Summary of changes:
 dev/run-tests.py | 3 ---
 project/SparkBuild.scala | 7 ---
 2 files changed, 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (84efa04 -> 8896f4a)

2020-07-24 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 84efa04  [SPARK-32308][SQL] Move by-name resolution logic of 
unionByName from API code to analysis phase
 add 8896f4a  Revert "[SPARK-32253][INFRA] Show errors only for the sbt 
tests of github actions"

No new revisions were added by this update.

Summary of changes:
 dev/run-tests.py | 3 ---
 project/SparkBuild.scala | 7 ---
 2 files changed, 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (84efa04 -> 8896f4a)

2020-07-24 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 84efa04  [SPARK-32308][SQL] Move by-name resolution logic of 
unionByName from API code to analysis phase
 add 8896f4a  Revert "[SPARK-32253][INFRA] Show errors only for the sbt 
tests of github actions"

No new revisions were added by this update.

Summary of changes:
 dev/run-tests.py | 3 ---
 project/SparkBuild.scala | 7 ---
 2 files changed, 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

58 matches

Mail list logo