date:20200702

[GitHub] [spark-website] huaxingao commented on pull request #278: Add Huaxin Gao to committers.md

2020-07-02 Thread GitBox



huaxingao commented on pull request #278:
URL: https://github.com/apache/spark-website/pull/278#issuecomment-653306775


   Thanks everyone!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] huaxingao closed pull request #278: Add Huaxin Gao to committers.md

2020-07-02 Thread GitBox



huaxingao closed pull request #278:
URL: https://github.com/apache/spark-website/pull/278


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: Add Huaxin Gao to committers.md

2020-07-02 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 18d7e21  Add Huaxin Gao to committers.md
18d7e21 is described below

commit 18d7e2103f9713adc09d69b65ebd4a48107c88f0
Author: Huaxin Gao 
AuthorDate: Thu Jul 2 19:38:42 2020 -0700

Add Huaxin Gao to committers.md

Author: Huaxin Gao 

Closes #278 from huaxingao/asf-site.
---
 committers.md| 1 +
 site/committers.html | 4 
 2 files changed, 5 insertions(+)

diff --git a/committers.md b/committers.md
index 42b89d4..77e768d 100644
--- a/committers.md
+++ b/committers.md
@@ -26,6 +26,7 @@ navigation:
 |Erik Erlandson|Red Hat|
 |Robert Evans|NVIDIA|
 |Wenchen Fan|Databricks|
+|Huaxin Gao|IBM|
 |Joseph Gonzalez|UC Berkeley|
 |Thomas Graves|NVIDIA|
 |Stephen Haberman|LinkedIn|
diff --git a/site/committers.html b/site/committers.html
index 5299961..66de9a1 100644
--- a/site/committers.html
+++ b/site/committers.html
@@ -275,6 +275,10 @@
   Databricks
 
 
+  Huaxin Gao
+  IBM
+
+
   Joseph Gonzalez
   UC Berkeley
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] HeartSaVioR commented on pull request #278: Add Huaxin Gao to committers.md

2020-07-02 Thread GitBox



HeartSaVioR commented on pull request #278:
URL: https://github.com/apache/spark-website/pull/278#issuecomment-653269669


   She's added in the roaster. http://people.apache.org/committer-index.html
   
   Probably she didn't setup ASF Gitbox yet?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] viirya edited a comment on pull request #278: Add Huaxin Gao to committers.md

2020-07-02 Thread GitBox



viirya edited a comment on pull request #278:
URL: https://github.com/apache/spark-website/pull/278#issuecomment-653265728


   @srowen, @huaxingao seems not been added into 
https://github.com/orgs/apache/teams/spark-committers/members yet?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] viirya commented on pull request #278: Add Huaxin Gao to committers.md

2020-07-02 Thread GitBox



viirya commented on pull request #278:
URL: https://github.com/apache/spark-website/pull/278#issuecomment-653265728


   @srowen @huaxingao seems not been added into 
https://github.com/orgs/apache/teams/spark-committers/members yet?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] viirya commented on pull request #277: Add Jungtaek Lim to committers.md

2020-07-02 Thread GitBox



viirya commented on pull request #277:
URL: https://github.com/apache/spark-website/pull/277#issuecomment-653260609


   Congrats!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] huaxingao commented on pull request #277: Add Jungtaek Lim to committers.md

2020-07-02 Thread GitBox



huaxingao commented on pull request #277:
URL: https://github.com/apache/spark-website/pull/277#issuecomment-653256977


   Congratulations!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] huaxingao opened a new pull request #278: Add Huaxin Gao to committers.md

2020-07-02 Thread GitBox



huaxingao opened a new pull request #278:
URL: https://github.com/apache/spark-website/pull/278


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark

2020-07-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 31e1ea1  [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in 
JsonBenchmark
31e1ea1 is described below

commit 31e1ea165f3cbb503b05452b448010e81474dcad
Author: Max Gekk 
AuthorDate: Thu Jul 2 13:26:57 2020 -0700

[SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark

### What changes were proposed in this pull request?
Set the JSON option `inferTimestamp` to `true` for the cases that measure 
perf of timestamp inference.

### Why are the changes needed?
The PR https://github.com/apache/spark/pull/28966 disabled timestamp 
inference by default. As a consequence, some benchmarks don't measure perf of 
timestamp inference from JSON fields. This PR explicitly enable such inference.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By re-generating results of `JsonBenchmark`.

Closes #28981 from MaxGekk/json-inferTimestamps-disable-by-default-followup.

Authored-by: Max Gekk 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 42f01e314b4874236544cc8b94bef766269385ee)
Signed-off-by: Dongjoon Hyun 
---
 .../benchmarks/JsonBenchmark-jdk11-results.txt | 86 +++---
 sql/core/benchmarks/JsonBenchmark-results.txt  | 86 +++---
 .../execution/datasources/json/JsonBenchmark.scala |  4 +-
 3 files changed, 88 insertions(+), 88 deletions(-)

diff --git a/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
index ff37084..2d506f0 100644
--- a/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
+++ b/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
@@ -7,106 +7,106 @@ OpenJDK 64-Bit Server VM 
11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-106
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 JSON schema inferring:Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   69219  69342 
116  1.4 692.2   1.0X
-UTF-8 is set 143950 143986 
 55  0.71439.5   0.5X
+No encoding   73307  73400 
141  1.4 733.1   1.0X
+UTF-8 is set 143834 143925 
152  0.71438.3   0.5X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a short column: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   57828  57913 
136  1.7 578.3   1.0X
-UTF-8 is set  83649  83711 
 60  1.2 836.5   0.7X
+No encoding   50894  51065 
292  2.0 508.9   1.0X
+UTF-8 is set  98462  99455
1173  1.0 984.6   0.5X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a wide column:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   64560  65193
1023  0.26456.0   1.0X
-UTF-8 is set 102925 103174 
216  0.1   10292.5   0.6X
+No encoding   64011  64969
1001  0.26401.1   1.0X
+UTF-8 is set 102757 102984 
311  0.1   10275.7   0.6X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 select wide row:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

[spark] branch branch-3.0 updated: [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark

2020-07-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 31e1ea1  [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in 
JsonBenchmark
31e1ea1 is described below

commit 31e1ea165f3cbb503b05452b448010e81474dcad
Author: Max Gekk 
AuthorDate: Thu Jul 2 13:26:57 2020 -0700

[SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark

### What changes were proposed in this pull request?
Set the JSON option `inferTimestamp` to `true` for the cases that measure 
perf of timestamp inference.

### Why are the changes needed?
The PR https://github.com/apache/spark/pull/28966 disabled timestamp 
inference by default. As a consequence, some benchmarks don't measure perf of 
timestamp inference from JSON fields. This PR explicitly enable such inference.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By re-generating results of `JsonBenchmark`.

Closes #28981 from MaxGekk/json-inferTimestamps-disable-by-default-followup.

Authored-by: Max Gekk 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 42f01e314b4874236544cc8b94bef766269385ee)
Signed-off-by: Dongjoon Hyun 
---
 .../benchmarks/JsonBenchmark-jdk11-results.txt | 86 +++---
 sql/core/benchmarks/JsonBenchmark-results.txt  | 86 +++---
 .../execution/datasources/json/JsonBenchmark.scala |  4 +-
 3 files changed, 88 insertions(+), 88 deletions(-)

diff --git a/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
index ff37084..2d506f0 100644
--- a/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
+++ b/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
@@ -7,106 +7,106 @@ OpenJDK 64-Bit Server VM 
11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-106
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 JSON schema inferring:Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   69219  69342 
116  1.4 692.2   1.0X
-UTF-8 is set 143950 143986 
 55  0.71439.5   0.5X
+No encoding   73307  73400 
141  1.4 733.1   1.0X
+UTF-8 is set 143834 143925 
152  0.71438.3   0.5X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a short column: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   57828  57913 
136  1.7 578.3   1.0X
-UTF-8 is set  83649  83711 
 60  1.2 836.5   0.7X
+No encoding   50894  51065 
292  2.0 508.9   1.0X
+UTF-8 is set  98462  99455
1173  1.0 984.6   0.5X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a wide column:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   64560  65193
1023  0.26456.0   1.0X
-UTF-8 is set 102925 103174 
216  0.1   10292.5   0.6X
+No encoding   64011  64969
1001  0.26401.1   1.0X
+UTF-8 is set 102757 102984 
311  0.1   10275.7   0.6X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 select wide row:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

[spark] branch master updated (0acad58 -> 42f01e3)

2020-07-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0acad58  [SPARK-32156][SPARK-31061][TESTS][SQL] Refactor two similar 
test cases from in HiveExternalCatalogSuite
 add 42f01e3  [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in 
JsonBenchmark

No new revisions were added by this update.

Summary of changes:
 .../benchmarks/JsonBenchmark-jdk11-results.txt | 86 +++---
 sql/core/benchmarks/JsonBenchmark-results.txt  | 86 +++---
 .../execution/datasources/json/JsonBenchmark.scala |  4 +-
 3 files changed, 88 insertions(+), 88 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark

2020-07-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 31e1ea1  [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in 
JsonBenchmark
31e1ea1 is described below

commit 31e1ea165f3cbb503b05452b448010e81474dcad
Author: Max Gekk 
AuthorDate: Thu Jul 2 13:26:57 2020 -0700

[SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark

### What changes were proposed in this pull request?
Set the JSON option `inferTimestamp` to `true` for the cases that measure 
perf of timestamp inference.

### Why are the changes needed?
The PR https://github.com/apache/spark/pull/28966 disabled timestamp 
inference by default. As a consequence, some benchmarks don't measure perf of 
timestamp inference from JSON fields. This PR explicitly enable such inference.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By re-generating results of `JsonBenchmark`.

Closes #28981 from MaxGekk/json-inferTimestamps-disable-by-default-followup.

Authored-by: Max Gekk 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 42f01e314b4874236544cc8b94bef766269385ee)
Signed-off-by: Dongjoon Hyun 
---
 .../benchmarks/JsonBenchmark-jdk11-results.txt | 86 +++---
 sql/core/benchmarks/JsonBenchmark-results.txt  | 86 +++---
 .../execution/datasources/json/JsonBenchmark.scala |  4 +-
 3 files changed, 88 insertions(+), 88 deletions(-)

diff --git a/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
index ff37084..2d506f0 100644
--- a/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
+++ b/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
@@ -7,106 +7,106 @@ OpenJDK 64-Bit Server VM 
11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-106
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 JSON schema inferring:Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   69219  69342 
116  1.4 692.2   1.0X
-UTF-8 is set 143950 143986 
 55  0.71439.5   0.5X
+No encoding   73307  73400 
141  1.4 733.1   1.0X
+UTF-8 is set 143834 143925 
152  0.71438.3   0.5X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a short column: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   57828  57913 
136  1.7 578.3   1.0X
-UTF-8 is set  83649  83711 
 60  1.2 836.5   0.7X
+No encoding   50894  51065 
292  2.0 508.9   1.0X
+UTF-8 is set  98462  99455
1173  1.0 984.6   0.5X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a wide column:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   64560  65193
1023  0.26456.0   1.0X
-UTF-8 is set 102925 103174 
216  0.1   10292.5   0.6X
+No encoding   64011  64969
1001  0.26401.1   1.0X
+UTF-8 is set 102757 102984 
311  0.1   10275.7   0.6X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 select wide row:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

[spark] branch master updated (0acad58 -> 42f01e3)

2020-07-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0acad58  [SPARK-32156][SPARK-31061][TESTS][SQL] Refactor two similar 
test cases from in HiveExternalCatalogSuite
 add 42f01e3  [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in 
JsonBenchmark

No new revisions were added by this update.

Summary of changes:
 .../benchmarks/JsonBenchmark-jdk11-results.txt | 86 +++---
 sql/core/benchmarks/JsonBenchmark-results.txt  | 86 +++---
 .../execution/datasources/json/JsonBenchmark.scala |  4 +-
 3 files changed, 88 insertions(+), 88 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark

2020-07-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 31e1ea1  [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in 
JsonBenchmark
31e1ea1 is described below

commit 31e1ea165f3cbb503b05452b448010e81474dcad
Author: Max Gekk 
AuthorDate: Thu Jul 2 13:26:57 2020 -0700

[SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark

### What changes were proposed in this pull request?
Set the JSON option `inferTimestamp` to `true` for the cases that measure 
perf of timestamp inference.

### Why are the changes needed?
The PR https://github.com/apache/spark/pull/28966 disabled timestamp 
inference by default. As a consequence, some benchmarks don't measure perf of 
timestamp inference from JSON fields. This PR explicitly enable such inference.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By re-generating results of `JsonBenchmark`.

Closes #28981 from MaxGekk/json-inferTimestamps-disable-by-default-followup.

Authored-by: Max Gekk 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 42f01e314b4874236544cc8b94bef766269385ee)
Signed-off-by: Dongjoon Hyun 
---
 .../benchmarks/JsonBenchmark-jdk11-results.txt | 86 +++---
 sql/core/benchmarks/JsonBenchmark-results.txt  | 86 +++---
 .../execution/datasources/json/JsonBenchmark.scala |  4 +-
 3 files changed, 88 insertions(+), 88 deletions(-)

diff --git a/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
index ff37084..2d506f0 100644
--- a/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
+++ b/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
@@ -7,106 +7,106 @@ OpenJDK 64-Bit Server VM 
11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-106
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 JSON schema inferring:Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   69219  69342 
116  1.4 692.2   1.0X
-UTF-8 is set 143950 143986 
 55  0.71439.5   0.5X
+No encoding   73307  73400 
141  1.4 733.1   1.0X
+UTF-8 is set 143834 143925 
152  0.71438.3   0.5X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a short column: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   57828  57913 
136  1.7 578.3   1.0X
-UTF-8 is set  83649  83711 
 60  1.2 836.5   0.7X
+No encoding   50894  51065 
292  2.0 508.9   1.0X
+UTF-8 is set  98462  99455
1173  1.0 984.6   0.5X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a wide column:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   64560  65193
1023  0.26456.0   1.0X
-UTF-8 is set 102925 103174 
216  0.1   10292.5   0.6X
+No encoding   64011  64969
1001  0.26401.1   1.0X
+UTF-8 is set 102757 102984 
311  0.1   10275.7   0.6X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 select wide row:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

[spark] branch master updated (0acad58 -> 42f01e3)

2020-07-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0acad58  [SPARK-32156][SPARK-31061][TESTS][SQL] Refactor two similar 
test cases from in HiveExternalCatalogSuite
 add 42f01e3  [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in 
JsonBenchmark

No new revisions were added by this update.

Summary of changes:
 .../benchmarks/JsonBenchmark-jdk11-results.txt | 86 +++---
 sql/core/benchmarks/JsonBenchmark-results.txt  | 86 +++---
 .../execution/datasources/json/JsonBenchmark.scala |  4 +-
 3 files changed, 88 insertions(+), 88 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark

2020-07-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 31e1ea1  [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in 
JsonBenchmark
31e1ea1 is described below

commit 31e1ea165f3cbb503b05452b448010e81474dcad
Author: Max Gekk 
AuthorDate: Thu Jul 2 13:26:57 2020 -0700

[SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark

### What changes were proposed in this pull request?
Set the JSON option `inferTimestamp` to `true` for the cases that measure 
perf of timestamp inference.

### Why are the changes needed?
The PR https://github.com/apache/spark/pull/28966 disabled timestamp 
inference by default. As a consequence, some benchmarks don't measure perf of 
timestamp inference from JSON fields. This PR explicitly enable such inference.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By re-generating results of `JsonBenchmark`.

Closes #28981 from MaxGekk/json-inferTimestamps-disable-by-default-followup.

Authored-by: Max Gekk 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 42f01e314b4874236544cc8b94bef766269385ee)
Signed-off-by: Dongjoon Hyun 
---
 .../benchmarks/JsonBenchmark-jdk11-results.txt | 86 +++---
 sql/core/benchmarks/JsonBenchmark-results.txt  | 86 +++---
 .../execution/datasources/json/JsonBenchmark.scala |  4 +-
 3 files changed, 88 insertions(+), 88 deletions(-)

diff --git a/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
index ff37084..2d506f0 100644
--- a/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
+++ b/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
@@ -7,106 +7,106 @@ OpenJDK 64-Bit Server VM 
11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-106
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 JSON schema inferring:Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   69219  69342 
116  1.4 692.2   1.0X
-UTF-8 is set 143950 143986 
 55  0.71439.5   0.5X
+No encoding   73307  73400 
141  1.4 733.1   1.0X
+UTF-8 is set 143834 143925 
152  0.71438.3   0.5X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a short column: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   57828  57913 
136  1.7 578.3   1.0X
-UTF-8 is set  83649  83711 
 60  1.2 836.5   0.7X
+No encoding   50894  51065 
292  2.0 508.9   1.0X
+UTF-8 is set  98462  99455
1173  1.0 984.6   0.5X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a wide column:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   64560  65193
1023  0.26456.0   1.0X
-UTF-8 is set 102925 103174 
216  0.1   10292.5   0.6X
+No encoding   64011  64969
1001  0.26401.1   1.0X
+UTF-8 is set 102757 102984 
311  0.1   10275.7   0.6X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 select wide row:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

[spark] branch master updated (0acad58 -> 42f01e3)

2020-07-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0acad58  [SPARK-32156][SPARK-31061][TESTS][SQL] Refactor two similar 
test cases from in HiveExternalCatalogSuite
 add 42f01e3  [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in 
JsonBenchmark

No new revisions were added by this update.

Summary of changes:
 .../benchmarks/JsonBenchmark-jdk11-results.txt | 86 +++---
 sql/core/benchmarks/JsonBenchmark-results.txt  | 86 +++---
 .../execution/datasources/json/JsonBenchmark.scala |  4 +-
 3 files changed, 88 insertions(+), 88 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark

2020-07-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 42f01e3  [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in 
JsonBenchmark
42f01e3 is described below

commit 42f01e314b4874236544cc8b94bef766269385ee
Author: Max Gekk 
AuthorDate: Thu Jul 2 13:26:57 2020 -0700

[SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark

### What changes were proposed in this pull request?
Set the JSON option `inferTimestamp` to `true` for the cases that measure 
perf of timestamp inference.

### Why are the changes needed?
The PR https://github.com/apache/spark/pull/28966 disabled timestamp 
inference by default. As a consequence, some benchmarks don't measure perf of 
timestamp inference from JSON fields. This PR explicitly enable such inference.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By re-generating results of `JsonBenchmark`.

Closes #28981 from MaxGekk/json-inferTimestamps-disable-by-default-followup.

Authored-by: Max Gekk 
Signed-off-by: Dongjoon Hyun 
---
 .../benchmarks/JsonBenchmark-jdk11-results.txt | 86 +++---
 sql/core/benchmarks/JsonBenchmark-results.txt  | 86 +++---
 .../execution/datasources/json/JsonBenchmark.scala |  4 +-
 3 files changed, 88 insertions(+), 88 deletions(-)

diff --git a/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
index ff37084..2d506f0 100644
--- a/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
+++ b/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
@@ -7,106 +7,106 @@ OpenJDK 64-Bit Server VM 
11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-106
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 JSON schema inferring:Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   69219  69342 
116  1.4 692.2   1.0X
-UTF-8 is set 143950 143986 
 55  0.71439.5   0.5X
+No encoding   73307  73400 
141  1.4 733.1   1.0X
+UTF-8 is set 143834 143925 
152  0.71438.3   0.5X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a short column: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   57828  57913 
136  1.7 578.3   1.0X
-UTF-8 is set  83649  83711 
 60  1.2 836.5   0.7X
+No encoding   50894  51065 
292  2.0 508.9   1.0X
+UTF-8 is set  98462  99455
1173  1.0 984.6   0.5X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a wide column:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding   64560  65193
1023  0.26456.0   1.0X
-UTF-8 is set 102925 103174 
216  0.1   10292.5   0.6X
+No encoding   64011  64969
1001  0.26401.1   1.0X
+UTF-8 is set 102757 102984 
311  0.1   10275.7   0.6X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 select wide row:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-No encoding

[GitHub] [spark-website] MaxGekk commented on pull request #277: Add Jungtaek Lim to committers.md

2020-07-02 Thread GitBox



MaxGekk commented on pull request #277:
URL: https://github.com/apache/spark-website/pull/277#issuecomment-653139604


   @HeartSaVioR Congratulations!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f082a79 -> 0acad58)

2020-07-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f082a79  [SPARK-31100][SQL] Check namespace existens when setting 
namespace
 add 0acad58  [SPARK-32156][SPARK-31061][TESTS][SQL] Refactor two similar 
test cases from in HiveExternalCatalogSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/hive/HiveExternalCatalogSuite.scala  | 54 --
 1 file changed, 19 insertions(+), 35 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f082a79 -> 0acad58)

2020-07-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f082a79  [SPARK-31100][SQL] Check namespace existens when setting 
namespace
 add 0acad58  [SPARK-32156][SPARK-31061][TESTS][SQL] Refactor two similar 
test cases from in HiveExternalCatalogSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/hive/HiveExternalCatalogSuite.scala  | 54 --
 1 file changed, 19 insertions(+), 35 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f082a79 -> 0acad58)

2020-07-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f082a79  [SPARK-31100][SQL] Check namespace existens when setting 
namespace
 add 0acad58  [SPARK-32156][SPARK-31061][TESTS][SQL] Refactor two similar 
test cases from in HiveExternalCatalogSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/hive/HiveExternalCatalogSuite.scala  | 54 --
 1 file changed, 19 insertions(+), 35 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-32156][SPARK-31061][TESTS][SQL] Refactor two similar test cases from in HiveExternalCatalogSuite

2020-07-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0acad58  [SPARK-32156][SPARK-31061][TESTS][SQL] Refactor two similar 
test cases from in HiveExternalCatalogSuite
0acad58 is described below

commit 0acad589e120cd777b25c03777a3cce4ef704422
Author: TJX2014 
AuthorDate: Thu Jul 2 10:15:10 2020 -0700

[SPARK-32156][SPARK-31061][TESTS][SQL] Refactor two similar test cases from 
in HiveExternalCatalogSuite

### What changes were proposed in this pull request?
1.Merge two similar tests for SPARK-31061 and make the code clean.
2.Fix table alter issue due to lose path.

### Why are the changes needed?
Because this two tests for SPARK-31061 is very similar and could be merged.
And the first test case should use `rawTable` instead of `parquetTable` to 
alter.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test.

Closes #28980 from TJX2014/master-follow-merge-spark-31061-test-case.

Authored-by: TJX2014 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/hive/HiveExternalCatalogSuite.scala  | 54 --
 1 file changed, 19 insertions(+), 35 deletions(-)

diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala
index 473a93b..270595b 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala
@@ -181,41 +181,25 @@ class HiveExternalCatalogSuite extends 
ExternalCatalogSuite {
   "INSERT overwrite directory \"fs://localhost/tmp\" select 1 as a"))
   }
 
-  test("SPARK-31061: alterTable should be able to change table provider") {
+  test("SPARK-31061: alterTable should be able to change table provider/hive") 
{
 val catalog = newBasicCatalog()
-val parquetTable = CatalogTable(
-  identifier = TableIdentifier("parq_tbl", Some("db1")),
-  tableType = CatalogTableType.MANAGED,
-  storage = storageFormat.copy(locationUri = Some(new 
URI("file:/some/path"))),
-  schema = new StructType().add("col1", "int").add("col2", "string"),
-  provider = Some("parquet"))
-catalog.createTable(parquetTable, ignoreIfExists = false)
-
-val rawTable = externalCatalog.getTable("db1", "parq_tbl")
-assert(rawTable.provider === Some("parquet"))
-
-val fooTable = parquetTable.copy(provider = Some("foo"))
-catalog.alterTable(fooTable)
-val alteredTable = externalCatalog.getTable("db1", "parq_tbl")
-assert(alteredTable.provider === Some("foo"))
-  }
-
-  test("SPARK-31061: alterTable should be able to change table provider from 
hive") {
-val catalog = newBasicCatalog()
-val hiveTable = CatalogTable(
-  identifier = TableIdentifier("parq_tbl", Some("db1")),
-  tableType = CatalogTableType.MANAGED,
-  storage = storageFormat,
-  schema = new StructType().add("col1", "int").add("col2", "string"),
-  provider = Some("hive"))
-catalog.createTable(hiveTable, ignoreIfExists = false)
-
-val rawTable = externalCatalog.getTable("db1", "parq_tbl")
-assert(rawTable.provider === Some("hive"))
-
-val fooTable = rawTable.copy(provider = Some("foo"))
-catalog.alterTable(fooTable)
-val alteredTable = externalCatalog.getTable("db1", "parq_tbl")
-assert(alteredTable.provider === Some("foo"))
+Seq("parquet", "hive").foreach( provider => {
+  val tableDDL = CatalogTable(
+identifier = TableIdentifier("parq_tbl", Some("db1")),
+tableType = CatalogTableType.MANAGED,
+storage = storageFormat,
+schema = new StructType().add("col1", "int"),
+provider = Some(provider))
+  catalog.dropTable("db1", "parq_tbl", true, true)
+  catalog.createTable(tableDDL, ignoreIfExists = false)
+
+  val rawTable = externalCatalog.getTable("db1", "parq_tbl")
+  assert(rawTable.provider === Some(provider))
+
+  val fooTable = rawTable.copy(provider = Some("foo"))
+  catalog.alterTable(fooTable)
+  val alteredTable = externalCatalog.getTable("db1", "parq_tbl")
+  assert(alteredTable.provider === Some("foo"))
+})
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] gengliangwang commented on pull request #277: Add Jungtaek Lim to committers.md

2020-07-02 Thread GitBox



gengliangwang commented on pull request #277:
URL: https://github.com/apache/spark-website/pull/277#issuecomment-653124021


   Congrats!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] dongjoon-hyun commented on pull request #277: Add Jungtaek Lim to committers.md

2020-07-02 Thread GitBox



dongjoon-hyun commented on pull request #277:
URL: https://github.com/apache/spark-website/pull/277#issuecomment-653066964


   Congrats! :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f834156 -> f082a79)

2020-07-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f834156  [MINOR][TEST][SQL] Make in-limit.sql more robust
 add f082a79  [SPARK-31100][SQL] Check namespace existens when setting 
namespace

No new revisions were added by this update.

Summary of changes:
 .../sql/connector/catalog/CatalogManager.scala | 15 +++-
 .../connector/catalog/CatalogManagerSuite.scala| 16 +
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 28 ++
 3 files changed, 48 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f834156 -> f082a79)

2020-07-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f834156  [MINOR][TEST][SQL] Make in-limit.sql more robust
 add f082a79  [SPARK-31100][SQL] Check namespace existens when setting 
namespace

No new revisions were added by this update.

Summary of changes:
 .../sql/connector/catalog/CatalogManager.scala | 15 +++-
 .../connector/catalog/CatalogManagerSuite.scala| 16 +
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 28 ++
 3 files changed, 48 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f834156 -> f082a79)

2020-07-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f834156  [MINOR][TEST][SQL] Make in-limit.sql more robust
 add f082a79  [SPARK-31100][SQL] Check namespace existens when setting 
namespace

No new revisions were added by this update.

Summary of changes:
 .../sql/connector/catalog/CatalogManager.scala | 15 +++-
 .../connector/catalog/CatalogManagerSuite.scala| 16 +
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 28 ++
 3 files changed, 48 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f834156 -> f082a79)

2020-07-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f834156  [MINOR][TEST][SQL] Make in-limit.sql more robust
 add f082a79  [SPARK-31100][SQL] Check namespace existens when setting 
namespace

No new revisions were added by this update.

Summary of changes:
 .../sql/connector/catalog/CatalogManager.scala | 15 +++-
 .../connector/catalog/CatalogManagerSuite.scala| 16 +
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 28 ++
 3 files changed, 48 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [MINOR][TEST][SQL] Make in-limit.sql more robust

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 2227a16  [MINOR][TEST][SQL] Make in-limit.sql more robust
2227a16 is described below

commit 2227a166782797f0e47a5d850b9713829300a466
Author: Wenchen Fan 
AuthorDate: Thu Jul 2 21:04:26 2020 +0900

[MINOR][TEST][SQL] Make in-limit.sql more robust

### What changes were proposed in this pull request?

For queries like `t1d in (SELECT t2d FROM  t2 ORDER  BY t2c LIMIT 2)`, the 
result can be non-deterministic as the result of the subquery may output 
different results (it's not sorted by `t2d` and it has shuffle).

This PR makes the test more robust by sorting the output column.

### Why are the changes needed?

avoid flaky test

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #28976 from cloud-fan/small.

Authored-by: Wenchen Fan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit f83415629b18d628f72a32285f0afc24f29eaa1e)
Signed-off-by: HyukjinKwon 
---
 .../test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql | 4 ++--
 .../resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
index a40ee08..a3cab37 100644
--- 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
+++ 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
@@ -72,7 +72,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -93,7 +93,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last
diff --git 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
index 71ca1f8..cde1577 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
@@ -103,7 +103,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -136,7 +136,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [MINOR][TEST][SQL] Make in-limit.sql more robust

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 2227a16  [MINOR][TEST][SQL] Make in-limit.sql more robust
2227a16 is described below

commit 2227a166782797f0e47a5d850b9713829300a466
Author: Wenchen Fan 
AuthorDate: Thu Jul 2 21:04:26 2020 +0900

[MINOR][TEST][SQL] Make in-limit.sql more robust

### What changes were proposed in this pull request?

For queries like `t1d in (SELECT t2d FROM  t2 ORDER  BY t2c LIMIT 2)`, the 
result can be non-deterministic as the result of the subquery may output 
different results (it's not sorted by `t2d` and it has shuffle).

This PR makes the test more robust by sorting the output column.

### Why are the changes needed?

avoid flaky test

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #28976 from cloud-fan/small.

Authored-by: Wenchen Fan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit f83415629b18d628f72a32285f0afc24f29eaa1e)
Signed-off-by: HyukjinKwon 
---
 .../test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql | 4 ++--
 .../resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
index a40ee08..a3cab37 100644
--- 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
+++ 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
@@ -72,7 +72,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -93,7 +93,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last
diff --git 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
index 71ca1f8..cde1577 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
@@ -103,7 +103,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -136,7 +136,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [MINOR][TEST][SQL] Make in-limit.sql more robust

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 334b1e8  [MINOR][TEST][SQL] Make in-limit.sql more robust
334b1e8 is described below

commit 334b1e8c88d6f10d5e4cde8fa400e34fb04faa39
Author: Wenchen Fan 
AuthorDate: Thu Jul 2 21:04:26 2020 +0900

[MINOR][TEST][SQL] Make in-limit.sql more robust

### What changes were proposed in this pull request?

For queries like `t1d in (SELECT t2d FROM  t2 ORDER  BY t2c LIMIT 2)`, the 
result can be non-deterministic as the result of the subquery may output 
different results (it's not sorted by `t2d` and it has shuffle).

This PR makes the test more robust by sorting the output column.

### Why are the changes needed?

avoid flaky test

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #28976 from cloud-fan/small.

Authored-by: Wenchen Fan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit f83415629b18d628f72a32285f0afc24f29eaa1e)
Signed-off-by: HyukjinKwon 
---
 .../test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql | 4 ++--
 .../resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
index 481b5e8..0a16f11 100644
--- 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
+++ 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
@@ -72,7 +72,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -93,7 +93,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last
diff --git 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
index 1c33544..e24538b 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
@@ -103,7 +103,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -136,7 +136,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (45fe6b6 -> f834156)

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 45fe6b6  [MINOR][DOCS] Pyspark getActiveSession docstring
 add f834156  [MINOR][TEST][SQL] Make in-limit.sql more robust

No new revisions were added by this update.

Summary of changes:
 .../test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql | 4 ++--
 .../resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [MINOR][TEST][SQL] Make in-limit.sql more robust

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 2227a16  [MINOR][TEST][SQL] Make in-limit.sql more robust
2227a16 is described below

commit 2227a166782797f0e47a5d850b9713829300a466
Author: Wenchen Fan 
AuthorDate: Thu Jul 2 21:04:26 2020 +0900

[MINOR][TEST][SQL] Make in-limit.sql more robust

### What changes were proposed in this pull request?

For queries like `t1d in (SELECT t2d FROM  t2 ORDER  BY t2c LIMIT 2)`, the 
result can be non-deterministic as the result of the subquery may output 
different results (it's not sorted by `t2d` and it has shuffle).

This PR makes the test more robust by sorting the output column.

### Why are the changes needed?

avoid flaky test

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #28976 from cloud-fan/small.

Authored-by: Wenchen Fan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit f83415629b18d628f72a32285f0afc24f29eaa1e)
Signed-off-by: HyukjinKwon 
---
 .../test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql | 4 ++--
 .../resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
index a40ee08..a3cab37 100644
--- 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
+++ 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
@@ -72,7 +72,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -93,7 +93,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last
diff --git 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
index 71ca1f8..cde1577 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
@@ -103,7 +103,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -136,7 +136,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [MINOR][TEST][SQL] Make in-limit.sql more robust

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 334b1e8  [MINOR][TEST][SQL] Make in-limit.sql more robust
334b1e8 is described below

commit 334b1e8c88d6f10d5e4cde8fa400e34fb04faa39
Author: Wenchen Fan 
AuthorDate: Thu Jul 2 21:04:26 2020 +0900

[MINOR][TEST][SQL] Make in-limit.sql more robust

### What changes were proposed in this pull request?

For queries like `t1d in (SELECT t2d FROM  t2 ORDER  BY t2c LIMIT 2)`, the 
result can be non-deterministic as the result of the subquery may output 
different results (it's not sorted by `t2d` and it has shuffle).

This PR makes the test more robust by sorting the output column.

### Why are the changes needed?

avoid flaky test

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #28976 from cloud-fan/small.

Authored-by: Wenchen Fan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit f83415629b18d628f72a32285f0afc24f29eaa1e)
Signed-off-by: HyukjinKwon 
---
 .../test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql | 4 ++--
 .../resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
index 481b5e8..0a16f11 100644
--- 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
+++ 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
@@ -72,7 +72,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -93,7 +93,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last
diff --git 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
index 1c33544..e24538b 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
@@ -103,7 +103,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -136,7 +136,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (45fe6b6 -> f834156)

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 45fe6b6  [MINOR][DOCS] Pyspark getActiveSession docstring
 add f834156  [MINOR][TEST][SQL] Make in-limit.sql more robust

No new revisions were added by this update.

Summary of changes:
 .../test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql | 4 ++--
 .../resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [MINOR][DOCS] Pyspark getActiveSession docstring

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 5361f76  [MINOR][DOCS] Pyspark getActiveSession docstring
5361f76 is described below

commit 5361f76d84de4986aa65a687bece94f4220edd94
Author: animenon 
AuthorDate: Thu Jul 2 21:02:00 2020 +0900

[MINOR][DOCS] Pyspark getActiveSession docstring

### What changes were proposed in this pull request?

Minor fix so that the documentation of `getActiveSession` is fixed.
The sample code snippet doesn't come up formatted rightly, added spacing 
for this to be fixed.
Also added return to docs.

### Why are the changes needed?

The sample code is getting mixed up as description in the docs.

[Current Doc 
Link](http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=getactivesession#pyspark.sql.SparkSession.getActiveSession)


![image](https://user-images.githubusercontent.com/6907950/86331522-d7b6f800-bc66-11ea-998c-42085f5e5b04.png)

### Does this PR introduce _any_ user-facing change?

Yes, documentation of getActiveSession is fixed.
And added description about return.

### How was this patch tested?

Adding a spacing between description and code seems to fix the issue.

Closes #28978 from animenon/docs_minor.

Authored-by: animenon 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 45fe6b62a73540ff010317fc7518b007206707d6)
Signed-off-by: HyukjinKwon 
---
 python/pyspark/sql/session.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index 233f492..e9486a3 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -265,7 +265,10 @@ class SparkSession(SparkConversionMixin):
 @since(3.0)
 def getActiveSession(cls):
 """
-Returns the active SparkSession for the current thread, returned by 
the builder.
+Returns the active SparkSession for the current thread, returned by 
the builder
+
+:return: :class:`SparkSession` if an active session exists for the 
current thread
+
 >>> s = SparkSession.getActiveSession()
 >>> l = [('Alice', 1)]
 >>> rdd = s.sparkContext.parallelize(l)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [MINOR][TEST][SQL] Make in-limit.sql more robust

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 2227a16  [MINOR][TEST][SQL] Make in-limit.sql more robust
2227a16 is described below

commit 2227a166782797f0e47a5d850b9713829300a466
Author: Wenchen Fan 
AuthorDate: Thu Jul 2 21:04:26 2020 +0900

[MINOR][TEST][SQL] Make in-limit.sql more robust

### What changes were proposed in this pull request?

For queries like `t1d in (SELECT t2d FROM  t2 ORDER  BY t2c LIMIT 2)`, the 
result can be non-deterministic as the result of the subquery may output 
different results (it's not sorted by `t2d` and it has shuffle).

This PR makes the test more robust by sorting the output column.

### Why are the changes needed?

avoid flaky test

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #28976 from cloud-fan/small.

Authored-by: Wenchen Fan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit f83415629b18d628f72a32285f0afc24f29eaa1e)
Signed-off-by: HyukjinKwon 
---
 .../test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql | 4 ++--
 .../resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
index a40ee08..a3cab37 100644
--- 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
+++ 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
@@ -72,7 +72,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -93,7 +93,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last
diff --git 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
index 71ca1f8..cde1577 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
@@ -103,7 +103,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -136,7 +136,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [MINOR][TEST][SQL] Make in-limit.sql more robust

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 334b1e8  [MINOR][TEST][SQL] Make in-limit.sql more robust
334b1e8 is described below

commit 334b1e8c88d6f10d5e4cde8fa400e34fb04faa39
Author: Wenchen Fan 
AuthorDate: Thu Jul 2 21:04:26 2020 +0900

[MINOR][TEST][SQL] Make in-limit.sql more robust

### What changes were proposed in this pull request?

For queries like `t1d in (SELECT t2d FROM  t2 ORDER  BY t2c LIMIT 2)`, the 
result can be non-deterministic as the result of the subquery may output 
different results (it's not sorted by `t2d` and it has shuffle).

This PR makes the test more robust by sorting the output column.

### Why are the changes needed?

avoid flaky test

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #28976 from cloud-fan/small.

Authored-by: Wenchen Fan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit f83415629b18d628f72a32285f0afc24f29eaa1e)
Signed-off-by: HyukjinKwon 
---
 .../test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql | 4 ++--
 .../resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
index 481b5e8..0a16f11 100644
--- 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
+++ 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
@@ -72,7 +72,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -93,7 +93,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last
diff --git 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
index 1c33544..e24538b 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
@@ -103,7 +103,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -136,7 +136,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (45fe6b6 -> f834156)

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 45fe6b6  [MINOR][DOCS] Pyspark getActiveSession docstring
 add f834156  [MINOR][TEST][SQL] Make in-limit.sql more robust

No new revisions were added by this update.

Summary of changes:
 .../test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql | 4 ++--
 .../resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [MINOR][DOCS] Pyspark getActiveSession docstring

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 5361f76  [MINOR][DOCS] Pyspark getActiveSession docstring
5361f76 is described below

commit 5361f76d84de4986aa65a687bece94f4220edd94
Author: animenon 
AuthorDate: Thu Jul 2 21:02:00 2020 +0900

[MINOR][DOCS] Pyspark getActiveSession docstring

### What changes were proposed in this pull request?

Minor fix so that the documentation of `getActiveSession` is fixed.
The sample code snippet doesn't come up formatted rightly, added spacing 
for this to be fixed.
Also added return to docs.

### Why are the changes needed?

The sample code is getting mixed up as description in the docs.

[Current Doc 
Link](http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=getactivesession#pyspark.sql.SparkSession.getActiveSession)


![image](https://user-images.githubusercontent.com/6907950/86331522-d7b6f800-bc66-11ea-998c-42085f5e5b04.png)

### Does this PR introduce _any_ user-facing change?

Yes, documentation of getActiveSession is fixed.
And added description about return.

### How was this patch tested?

Adding a spacing between description and code seems to fix the issue.

Closes #28978 from animenon/docs_minor.

Authored-by: animenon 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 45fe6b62a73540ff010317fc7518b007206707d6)
Signed-off-by: HyukjinKwon 
---
 python/pyspark/sql/session.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index 233f492..e9486a3 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -265,7 +265,10 @@ class SparkSession(SparkConversionMixin):
 @since(3.0)
 def getActiveSession(cls):
 """
-Returns the active SparkSession for the current thread, returned by 
the builder.
+Returns the active SparkSession for the current thread, returned by 
the builder
+
+:return: :class:`SparkSession` if an active session exists for the 
current thread
+
 >>> s = SparkSession.getActiveSession()
 >>> l = [('Alice', 1)]
 >>> rdd = s.sparkContext.parallelize(l)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7fda184 -> 45fe6b6)

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7fda184  [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils
 add 45fe6b6  [MINOR][DOCS] Pyspark getActiveSession docstring

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/session.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [MINOR][TEST][SQL] Make in-limit.sql more robust

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 2227a16  [MINOR][TEST][SQL] Make in-limit.sql more robust
2227a16 is described below

commit 2227a166782797f0e47a5d850b9713829300a466
Author: Wenchen Fan 
AuthorDate: Thu Jul 2 21:04:26 2020 +0900

[MINOR][TEST][SQL] Make in-limit.sql more robust

### What changes were proposed in this pull request?

For queries like `t1d in (SELECT t2d FROM  t2 ORDER  BY t2c LIMIT 2)`, the 
result can be non-deterministic as the result of the subquery may output 
different results (it's not sorted by `t2d` and it has shuffle).

This PR makes the test more robust by sorting the output column.

### Why are the changes needed?

avoid flaky test

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #28976 from cloud-fan/small.

Authored-by: Wenchen Fan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit f83415629b18d628f72a32285f0afc24f29eaa1e)
Signed-off-by: HyukjinKwon 
---
 .../test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql | 4 ++--
 .../resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
index a40ee08..a3cab37 100644
--- 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
+++ 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
@@ -72,7 +72,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -93,7 +93,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last
diff --git 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
index 71ca1f8..cde1577 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
@@ -103,7 +103,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -136,7 +136,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [MINOR][TEST][SQL] Make in-limit.sql more robust

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 334b1e8  [MINOR][TEST][SQL] Make in-limit.sql more robust
334b1e8 is described below

commit 334b1e8c88d6f10d5e4cde8fa400e34fb04faa39
Author: Wenchen Fan 
AuthorDate: Thu Jul 2 21:04:26 2020 +0900

[MINOR][TEST][SQL] Make in-limit.sql more robust

### What changes were proposed in this pull request?

For queries like `t1d in (SELECT t2d FROM  t2 ORDER  BY t2c LIMIT 2)`, the 
result can be non-deterministic as the result of the subquery may output 
different results (it's not sorted by `t2d` and it has shuffle).

This PR makes the test more robust by sorting the output column.

### Why are the changes needed?

avoid flaky test

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #28976 from cloud-fan/small.

Authored-by: Wenchen Fan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit f83415629b18d628f72a32285f0afc24f29eaa1e)
Signed-off-by: HyukjinKwon 
---
 .../test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql | 4 ++--
 .../resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
index 481b5e8..0a16f11 100644
--- 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
+++ 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
@@ -72,7 +72,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -93,7 +93,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last
diff --git 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
index 1c33544..e24538b 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
@@ -103,7 +103,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -136,7 +136,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (45fe6b6 -> f834156)

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 45fe6b6  [MINOR][DOCS] Pyspark getActiveSession docstring
 add f834156  [MINOR][TEST][SQL] Make in-limit.sql more robust

No new revisions were added by this update.

Summary of changes:
 .../test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql | 4 ++--
 .../resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [MINOR][DOCS] Pyspark getActiveSession docstring

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 5361f76  [MINOR][DOCS] Pyspark getActiveSession docstring
5361f76 is described below

commit 5361f76d84de4986aa65a687bece94f4220edd94
Author: animenon 
AuthorDate: Thu Jul 2 21:02:00 2020 +0900

[MINOR][DOCS] Pyspark getActiveSession docstring

### What changes were proposed in this pull request?

Minor fix so that the documentation of `getActiveSession` is fixed.
The sample code snippet doesn't come up formatted rightly, added spacing 
for this to be fixed.
Also added return to docs.

### Why are the changes needed?

The sample code is getting mixed up as description in the docs.

[Current Doc 
Link](http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=getactivesession#pyspark.sql.SparkSession.getActiveSession)


![image](https://user-images.githubusercontent.com/6907950/86331522-d7b6f800-bc66-11ea-998c-42085f5e5b04.png)

### Does this PR introduce _any_ user-facing change?

Yes, documentation of getActiveSession is fixed.
And added description about return.

### How was this patch tested?

Adding a spacing between description and code seems to fix the issue.

Closes #28978 from animenon/docs_minor.

Authored-by: animenon 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 45fe6b62a73540ff010317fc7518b007206707d6)
Signed-off-by: HyukjinKwon 
---
 python/pyspark/sql/session.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index 233f492..e9486a3 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -265,7 +265,10 @@ class SparkSession(SparkConversionMixin):
 @since(3.0)
 def getActiveSession(cls):
 """
-Returns the active SparkSession for the current thread, returned by 
the builder.
+Returns the active SparkSession for the current thread, returned by 
the builder
+
+:return: :class:`SparkSession` if an active session exists for the 
current thread
+
 >>> s = SparkSession.getActiveSession()
 >>> l = [('Alice', 1)]
 >>> rdd = s.sparkContext.parallelize(l)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7fda184 -> 45fe6b6)

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7fda184  [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils
 add 45fe6b6  [MINOR][DOCS] Pyspark getActiveSession docstring

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/session.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [MINOR][TEST][SQL] Make in-limit.sql more robust

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 334b1e8  [MINOR][TEST][SQL] Make in-limit.sql more robust
334b1e8 is described below

commit 334b1e8c88d6f10d5e4cde8fa400e34fb04faa39
Author: Wenchen Fan 
AuthorDate: Thu Jul 2 21:04:26 2020 +0900

[MINOR][TEST][SQL] Make in-limit.sql more robust

### What changes were proposed in this pull request?

For queries like `t1d in (SELECT t2d FROM  t2 ORDER  BY t2c LIMIT 2)`, the 
result can be non-deterministic as the result of the subquery may output 
different results (it's not sorted by `t2d` and it has shuffle).

This PR makes the test more robust by sorting the output column.

### Why are the changes needed?

avoid flaky test

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #28976 from cloud-fan/small.

Authored-by: Wenchen Fan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit f83415629b18d628f72a32285f0afc24f29eaa1e)
Signed-off-by: HyukjinKwon 
---
 .../test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql | 4 ++--
 .../resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
index 481b5e8..0a16f11 100644
--- 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
+++ 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql
@@ -72,7 +72,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -93,7 +93,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last
diff --git 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
index 1c33544..e24538b 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out
@@ -103,7 +103,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d IN (SELECT t2d
FROM   t2
-   ORDER  BY t2c
+   ORDER  BY t2c, t2d
LIMIT 2)
 GROUP  BY t1b
 ORDER  BY t1b DESC NULLS FIRST
@@ -136,7 +136,7 @@ SELECT Count(DISTINCT( t1a )),
 FROM   t1
 WHERE  t1d NOT IN (SELECT t2d
FROM   t2
-   ORDER  BY t2b DESC nulls first
+   ORDER  BY t2b DESC nulls first, t2d
LIMIT 1)
 GROUP  BY t1b
 ORDER BY t1b NULLS last


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (45fe6b6 -> f834156)

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 45fe6b6  [MINOR][DOCS] Pyspark getActiveSession docstring
 add f834156  [MINOR][TEST][SQL] Make in-limit.sql more robust

No new revisions were added by this update.

Summary of changes:
 .../test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql | 4 ++--
 .../resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [MINOR][DOCS] Pyspark getActiveSession docstring

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 5361f76  [MINOR][DOCS] Pyspark getActiveSession docstring
5361f76 is described below

commit 5361f76d84de4986aa65a687bece94f4220edd94
Author: animenon 
AuthorDate: Thu Jul 2 21:02:00 2020 +0900

[MINOR][DOCS] Pyspark getActiveSession docstring

### What changes were proposed in this pull request?

Minor fix so that the documentation of `getActiveSession` is fixed.
The sample code snippet doesn't come up formatted rightly, added spacing 
for this to be fixed.
Also added return to docs.

### Why are the changes needed?

The sample code is getting mixed up as description in the docs.

[Current Doc 
Link](http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=getactivesession#pyspark.sql.SparkSession.getActiveSession)


![image](https://user-images.githubusercontent.com/6907950/86331522-d7b6f800-bc66-11ea-998c-42085f5e5b04.png)

### Does this PR introduce _any_ user-facing change?

Yes, documentation of getActiveSession is fixed.
And added description about return.

### How was this patch tested?

Adding a spacing between description and code seems to fix the issue.

Closes #28978 from animenon/docs_minor.

Authored-by: animenon 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 45fe6b62a73540ff010317fc7518b007206707d6)
Signed-off-by: HyukjinKwon 
---
 python/pyspark/sql/session.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index 233f492..e9486a3 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -265,7 +265,10 @@ class SparkSession(SparkConversionMixin):
 @since(3.0)
 def getActiveSession(cls):
 """
-Returns the active SparkSession for the current thread, returned by 
the builder.
+Returns the active SparkSession for the current thread, returned by 
the builder
+
+:return: :class:`SparkSession` if an active session exists for the 
current thread
+
 >>> s = SparkSession.getActiveSession()
 >>> l = [('Alice', 1)]
 >>> rdd = s.sparkContext.parallelize(l)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7fda184 -> 45fe6b6)

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7fda184  [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils
 add 45fe6b6  [MINOR][DOCS] Pyspark getActiveSession docstring

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/session.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [MINOR][DOCS] Pyspark getActiveSession docstring

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 5361f76  [MINOR][DOCS] Pyspark getActiveSession docstring
5361f76 is described below

commit 5361f76d84de4986aa65a687bece94f4220edd94
Author: animenon 
AuthorDate: Thu Jul 2 21:02:00 2020 +0900

[MINOR][DOCS] Pyspark getActiveSession docstring

### What changes were proposed in this pull request?

Minor fix so that the documentation of `getActiveSession` is fixed.
The sample code snippet doesn't come up formatted rightly, added spacing 
for this to be fixed.
Also added return to docs.

### Why are the changes needed?

The sample code is getting mixed up as description in the docs.

[Current Doc 
Link](http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=getactivesession#pyspark.sql.SparkSession.getActiveSession)


![image](https://user-images.githubusercontent.com/6907950/86331522-d7b6f800-bc66-11ea-998c-42085f5e5b04.png)

### Does this PR introduce _any_ user-facing change?

Yes, documentation of getActiveSession is fixed.
And added description about return.

### How was this patch tested?

Adding a spacing between description and code seems to fix the issue.

Closes #28978 from animenon/docs_minor.

Authored-by: animenon 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 45fe6b62a73540ff010317fc7518b007206707d6)
Signed-off-by: HyukjinKwon 
---
 python/pyspark/sql/session.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index 233f492..e9486a3 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -265,7 +265,10 @@ class SparkSession(SparkConversionMixin):
 @since(3.0)
 def getActiveSession(cls):
 """
-Returns the active SparkSession for the current thread, returned by 
the builder.
+Returns the active SparkSession for the current thread, returned by 
the builder
+
+:return: :class:`SparkSession` if an active session exists for the 
current thread
+
 >>> s = SparkSession.getActiveSession()
 >>> l = [('Alice', 1)]
 >>> rdd = s.sparkContext.parallelize(l)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7fda184 -> 45fe6b6)

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7fda184  [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils
 add 45fe6b6  [MINOR][DOCS] Pyspark getActiveSession docstring

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/session.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7fda184 -> 45fe6b6)

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7fda184  [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils
 add 45fe6b6  [MINOR][DOCS] Pyspark getActiveSession docstring

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/session.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (bc1acfe -> 7f4d452)

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bc1acfe  [SPARK-32089][R][BUILD] Upgrade R version to 4.0.2 in the 
release DockerFiile
 add 7f4d452  [SPARK-31935][2.4][SQL][FOLLOWUP] Hadoop file system config 
should be effective in data source options

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/CacheManager.scala  | 17 
 .../InsertIntoHadoopFsRelationCommand.scala|  2 +-
 .../datasources/parquet/ParquetFileFormat.scala|  6 ++--
 .../spark/sql/FileBasedDataSourceSuite.scala   | 32 +-
 .../apache/spark/sql/hive/orc/OrcFileFormat.scala  |  2 +-
 5 files changed, 36 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (bc1acfe -> 7f4d452)

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bc1acfe  [SPARK-32089][R][BUILD] Upgrade R version to 4.0.2 in the 
release DockerFiile
 add 7f4d452  [SPARK-31935][2.4][SQL][FOLLOWUP] Hadoop file system config 
should be effective in data source options

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/CacheManager.scala  | 17 
 .../InsertIntoHadoopFsRelationCommand.scala|  2 +-
 .../datasources/parquet/ParquetFileFormat.scala|  6 ++--
 .../spark/sql/FileBasedDataSourceSuite.scala   | 32 +-
 .../apache/spark/sql/hive/orc/OrcFileFormat.scala  |  2 +-
 5 files changed, 36 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b6bb158  [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils
b6bb158 is described below

commit b6bb158d3c1191aac6f4d27d84d7c791c5063b24
Author: pancheng <379377...@qq.com>
AuthorDate: Thu Jul 2 19:21:11 2020 +0900

[SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils

### What changes were proposed in this pull request?
Correct file seprate use in 
`ExecutorDiskUtils.createNormalizedInternedPathname` on Windows

### Why are the changes needed?
`ExternalShuffleBlockResolverSuite` failed on Windows, see detail at:
https://issues.apache.org/jira/browse/SPARK-32121

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
The existed test suite.

Closes #28940 from pan3793/SPARK-32121.

Lead-authored-by: pancheng <379377...@qq.com>
Co-authored-by: chengpan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 7fda184f0fc39613fb68e912c189c54b93c638e6)
Signed-off-by: HyukjinKwon 
---
 .../spark/network/shuffle/ExecutorDiskUtils.java   | 18 +++---
 .../shuffle/ExternalShuffleBlockResolverSuite.java | 22 +++---
 2 files changed, 30 insertions(+), 10 deletions(-)

diff --git 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
index 13f6046..6549cac 100644
--- 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
+++ 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
@@ -23,11 +23,19 @@ import java.util.regex.Pattern;
 
 import com.google.common.annotations.VisibleForTesting;
 
+import org.apache.commons.lang3.SystemUtils;
 import org.apache.spark.network.util.JavaUtils;
 
 public class ExecutorDiskUtils {
 
-  private static final Pattern MULTIPLE_SEPARATORS = 
Pattern.compile(File.separator + "{2,}");
+  private static final Pattern MULTIPLE_SEPARATORS;
+  static {
+if (SystemUtils.IS_OS_WINDOWS) {
+  MULTIPLE_SEPARATORS = Pattern.compile("[/]+");
+} else {
+  MULTIPLE_SEPARATORS = Pattern.compile("/{2,}");
+}
+  }
 
   /**
* Hashes a filename into the corresponding local directory, in a manner 
consistent with
@@ -50,14 +58,18 @@ public class ExecutorDiskUtils {
* the internal code in java.io.File would normalize it later, creating a 
new "foo/bar"
* String copy. Unfortunately, we cannot just reuse the normalization code 
that java.io.File
* uses, since it is in the package-private class java.io.FileSystem.
+   *
+   * On Windows, separator "\" is used instead of "/".
+   *
+   * "\\" is a legal character in path name on Unix-like OS, but illegal on 
Windows.
*/
   @VisibleForTesting
   static String createNormalizedInternedPathname(String dir1, String dir2, 
String fname) {
 String pathname = dir1 + File.separator + dir2 + File.separator + fname;
 Matcher m = MULTIPLE_SEPARATORS.matcher(pathname);
-pathname = m.replaceAll("/");
+pathname = m.replaceAll(Matcher.quoteReplacement(File.separator));
 // A single trailing slash needs to be taken care of separately
-if (pathname.length() > 1 && pathname.endsWith("/")) {
+if (pathname.length() > 1 && pathname.charAt(pathname.length() - 1) == 
File.separatorChar) {
   pathname = pathname.substring(0, pathname.length() - 1);
 }
 return pathname.intern();
diff --git 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
index 09b3143..6515b6c 100644
--- 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
+++ 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
@@ -25,6 +25,7 @@ import java.nio.charset.StandardCharsets;
 
 import com.fasterxml.jackson.databind.ObjectMapper;
 import com.google.common.io.CharStreams;
+import org.apache.commons.lang3.SystemUtils;
 import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo;
 import org.apache.spark.network.util.MapConfigProvider;
 import org.apache.spark.network.util.TransportConf;
@@ -146,12 +147,19 @@ public class ExternalShuffleBlockResolverSuite {
 
   @Test
   public void testNormalizeAndInternPathname() {
-assertPathsMatch("/foo", "bar", "baz", "/foo/bar/baz");
-assertPathsMatch("//foo/", "bar/", "//baz", "/foo/bar/baz");
-assertPathsMatch("foo", "bar", "baz///", "foo/bar/baz");
-

[spark] branch branch-2.4 updated (bc1acfe -> 7f4d452)

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bc1acfe  [SPARK-32089][R][BUILD] Upgrade R version to 4.0.2 in the 
release DockerFiile
 add 7f4d452  [SPARK-31935][2.4][SQL][FOLLOWUP] Hadoop file system config 
should be effective in data source options

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/CacheManager.scala  | 17 
 .../InsertIntoHadoopFsRelationCommand.scala|  2 +-
 .../datasources/parquet/ParquetFileFormat.scala|  6 ++--
 .../spark/sql/FileBasedDataSourceSuite.scala   | 32 +-
 .../apache/spark/sql/hive/orc/OrcFileFormat.scala  |  2 +-
 5 files changed, 36 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b6bb158  [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils
b6bb158 is described below

commit b6bb158d3c1191aac6f4d27d84d7c791c5063b24
Author: pancheng <379377...@qq.com>
AuthorDate: Thu Jul 2 19:21:11 2020 +0900

[SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils

### What changes were proposed in this pull request?
Correct file seprate use in 
`ExecutorDiskUtils.createNormalizedInternedPathname` on Windows

### Why are the changes needed?
`ExternalShuffleBlockResolverSuite` failed on Windows, see detail at:
https://issues.apache.org/jira/browse/SPARK-32121

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
The existed test suite.

Closes #28940 from pan3793/SPARK-32121.

Lead-authored-by: pancheng <379377...@qq.com>
Co-authored-by: chengpan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 7fda184f0fc39613fb68e912c189c54b93c638e6)
Signed-off-by: HyukjinKwon 
---
 .../spark/network/shuffle/ExecutorDiskUtils.java   | 18 +++---
 .../shuffle/ExternalShuffleBlockResolverSuite.java | 22 +++---
 2 files changed, 30 insertions(+), 10 deletions(-)

diff --git 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
index 13f6046..6549cac 100644
--- 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
+++ 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
@@ -23,11 +23,19 @@ import java.util.regex.Pattern;
 
 import com.google.common.annotations.VisibleForTesting;
 
+import org.apache.commons.lang3.SystemUtils;
 import org.apache.spark.network.util.JavaUtils;
 
 public class ExecutorDiskUtils {
 
-  private static final Pattern MULTIPLE_SEPARATORS = 
Pattern.compile(File.separator + "{2,}");
+  private static final Pattern MULTIPLE_SEPARATORS;
+  static {
+if (SystemUtils.IS_OS_WINDOWS) {
+  MULTIPLE_SEPARATORS = Pattern.compile("[/]+");
+} else {
+  MULTIPLE_SEPARATORS = Pattern.compile("/{2,}");
+}
+  }
 
   /**
* Hashes a filename into the corresponding local directory, in a manner 
consistent with
@@ -50,14 +58,18 @@ public class ExecutorDiskUtils {
* the internal code in java.io.File would normalize it later, creating a 
new "foo/bar"
* String copy. Unfortunately, we cannot just reuse the normalization code 
that java.io.File
* uses, since it is in the package-private class java.io.FileSystem.
+   *
+   * On Windows, separator "\" is used instead of "/".
+   *
+   * "\\" is a legal character in path name on Unix-like OS, but illegal on 
Windows.
*/
   @VisibleForTesting
   static String createNormalizedInternedPathname(String dir1, String dir2, 
String fname) {
 String pathname = dir1 + File.separator + dir2 + File.separator + fname;
 Matcher m = MULTIPLE_SEPARATORS.matcher(pathname);
-pathname = m.replaceAll("/");
+pathname = m.replaceAll(Matcher.quoteReplacement(File.separator));
 // A single trailing slash needs to be taken care of separately
-if (pathname.length() > 1 && pathname.endsWith("/")) {
+if (pathname.length() > 1 && pathname.charAt(pathname.length() - 1) == 
File.separatorChar) {
   pathname = pathname.substring(0, pathname.length() - 1);
 }
 return pathname.intern();
diff --git 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
index 09b3143..6515b6c 100644
--- 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
+++ 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
@@ -25,6 +25,7 @@ import java.nio.charset.StandardCharsets;
 
 import com.fasterxml.jackson.databind.ObjectMapper;
 import com.google.common.io.CharStreams;
+import org.apache.commons.lang3.SystemUtils;
 import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo;
 import org.apache.spark.network.util.MapConfigProvider;
 import org.apache.spark.network.util.TransportConf;
@@ -146,12 +147,19 @@ public class ExternalShuffleBlockResolverSuite {
 
   @Test
   public void testNormalizeAndInternPathname() {
-assertPathsMatch("/foo", "bar", "baz", "/foo/bar/baz");
-assertPathsMatch("//foo/", "bar/", "//baz", "/foo/bar/baz");
-assertPathsMatch("foo", "bar", "baz///", "foo/bar/baz");
-

[spark] branch branch-2.4 updated (bc1acfe -> 7f4d452)

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bc1acfe  [SPARK-32089][R][BUILD] Upgrade R version to 4.0.2 in the 
release DockerFiile
 add 7f4d452  [SPARK-31935][2.4][SQL][FOLLOWUP] Hadoop file system config 
should be effective in data source options

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/CacheManager.scala  | 17 
 .../InsertIntoHadoopFsRelationCommand.scala|  2 +-
 .../datasources/parquet/ParquetFileFormat.scala|  6 ++--
 .../spark/sql/FileBasedDataSourceSuite.scala   | 32 +-
 .../apache/spark/sql/hive/orc/OrcFileFormat.scala  |  2 +-
 5 files changed, 36 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b6bb158  [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils
b6bb158 is described below

commit b6bb158d3c1191aac6f4d27d84d7c791c5063b24
Author: pancheng <379377...@qq.com>
AuthorDate: Thu Jul 2 19:21:11 2020 +0900

[SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils

### What changes were proposed in this pull request?
Correct file seprate use in 
`ExecutorDiskUtils.createNormalizedInternedPathname` on Windows

### Why are the changes needed?
`ExternalShuffleBlockResolverSuite` failed on Windows, see detail at:
https://issues.apache.org/jira/browse/SPARK-32121

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
The existed test suite.

Closes #28940 from pan3793/SPARK-32121.

Lead-authored-by: pancheng <379377...@qq.com>
Co-authored-by: chengpan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 7fda184f0fc39613fb68e912c189c54b93c638e6)
Signed-off-by: HyukjinKwon 
---
 .../spark/network/shuffle/ExecutorDiskUtils.java   | 18 +++---
 .../shuffle/ExternalShuffleBlockResolverSuite.java | 22 +++---
 2 files changed, 30 insertions(+), 10 deletions(-)

diff --git 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
index 13f6046..6549cac 100644
--- 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
+++ 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
@@ -23,11 +23,19 @@ import java.util.regex.Pattern;
 
 import com.google.common.annotations.VisibleForTesting;
 
+import org.apache.commons.lang3.SystemUtils;
 import org.apache.spark.network.util.JavaUtils;
 
 public class ExecutorDiskUtils {
 
-  private static final Pattern MULTIPLE_SEPARATORS = 
Pattern.compile(File.separator + "{2,}");
+  private static final Pattern MULTIPLE_SEPARATORS;
+  static {
+if (SystemUtils.IS_OS_WINDOWS) {
+  MULTIPLE_SEPARATORS = Pattern.compile("[/]+");
+} else {
+  MULTIPLE_SEPARATORS = Pattern.compile("/{2,}");
+}
+  }
 
   /**
* Hashes a filename into the corresponding local directory, in a manner 
consistent with
@@ -50,14 +58,18 @@ public class ExecutorDiskUtils {
* the internal code in java.io.File would normalize it later, creating a 
new "foo/bar"
* String copy. Unfortunately, we cannot just reuse the normalization code 
that java.io.File
* uses, since it is in the package-private class java.io.FileSystem.
+   *
+   * On Windows, separator "\" is used instead of "/".
+   *
+   * "\\" is a legal character in path name on Unix-like OS, but illegal on 
Windows.
*/
   @VisibleForTesting
   static String createNormalizedInternedPathname(String dir1, String dir2, 
String fname) {
 String pathname = dir1 + File.separator + dir2 + File.separator + fname;
 Matcher m = MULTIPLE_SEPARATORS.matcher(pathname);
-pathname = m.replaceAll("/");
+pathname = m.replaceAll(Matcher.quoteReplacement(File.separator));
 // A single trailing slash needs to be taken care of separately
-if (pathname.length() > 1 && pathname.endsWith("/")) {
+if (pathname.length() > 1 && pathname.charAt(pathname.length() - 1) == 
File.separatorChar) {
   pathname = pathname.substring(0, pathname.length() - 1);
 }
 return pathname.intern();
diff --git 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
index 09b3143..6515b6c 100644
--- 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
+++ 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
@@ -25,6 +25,7 @@ import java.nio.charset.StandardCharsets;
 
 import com.fasterxml.jackson.databind.ObjectMapper;
 import com.google.common.io.CharStreams;
+import org.apache.commons.lang3.SystemUtils;
 import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo;
 import org.apache.spark.network.util.MapConfigProvider;
 import org.apache.spark.network.util.TransportConf;
@@ -146,12 +147,19 @@ public class ExternalShuffleBlockResolverSuite {
 
   @Test
   public void testNormalizeAndInternPathname() {
-assertPathsMatch("/foo", "bar", "baz", "/foo/bar/baz");
-assertPathsMatch("//foo/", "bar/", "//baz", "/foo/bar/baz");
-assertPathsMatch("foo", "bar", "baz///", "foo/bar/baz");
-

[spark] branch master updated (3f7780d -> 7fda184)

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3f7780d  [SPARK-32136][SQL] NormalizeFloatingNumbers should work on 
null struct
 add 7fda184  [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils

No new revisions were added by this update.

Summary of changes:
 .../spark/network/shuffle/ExecutorDiskUtils.java   | 18 +++---
 .../shuffle/ExternalShuffleBlockResolverSuite.java | 22 +++---
 2 files changed, 30 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b6bb158  [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils
b6bb158 is described below

commit b6bb158d3c1191aac6f4d27d84d7c791c5063b24
Author: pancheng <379377...@qq.com>
AuthorDate: Thu Jul 2 19:21:11 2020 +0900

[SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils

### What changes were proposed in this pull request?
Correct file seprate use in 
`ExecutorDiskUtils.createNormalizedInternedPathname` on Windows

### Why are the changes needed?
`ExternalShuffleBlockResolverSuite` failed on Windows, see detail at:
https://issues.apache.org/jira/browse/SPARK-32121

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
The existed test suite.

Closes #28940 from pan3793/SPARK-32121.

Lead-authored-by: pancheng <379377...@qq.com>
Co-authored-by: chengpan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 7fda184f0fc39613fb68e912c189c54b93c638e6)
Signed-off-by: HyukjinKwon 
---
 .../spark/network/shuffle/ExecutorDiskUtils.java   | 18 +++---
 .../shuffle/ExternalShuffleBlockResolverSuite.java | 22 +++---
 2 files changed, 30 insertions(+), 10 deletions(-)

diff --git 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
index 13f6046..6549cac 100644
--- 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
+++ 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
@@ -23,11 +23,19 @@ import java.util.regex.Pattern;
 
 import com.google.common.annotations.VisibleForTesting;
 
+import org.apache.commons.lang3.SystemUtils;
 import org.apache.spark.network.util.JavaUtils;
 
 public class ExecutorDiskUtils {
 
-  private static final Pattern MULTIPLE_SEPARATORS = 
Pattern.compile(File.separator + "{2,}");
+  private static final Pattern MULTIPLE_SEPARATORS;
+  static {
+if (SystemUtils.IS_OS_WINDOWS) {
+  MULTIPLE_SEPARATORS = Pattern.compile("[/]+");
+} else {
+  MULTIPLE_SEPARATORS = Pattern.compile("/{2,}");
+}
+  }
 
   /**
* Hashes a filename into the corresponding local directory, in a manner 
consistent with
@@ -50,14 +58,18 @@ public class ExecutorDiskUtils {
* the internal code in java.io.File would normalize it later, creating a 
new "foo/bar"
* String copy. Unfortunately, we cannot just reuse the normalization code 
that java.io.File
* uses, since it is in the package-private class java.io.FileSystem.
+   *
+   * On Windows, separator "\" is used instead of "/".
+   *
+   * "\\" is a legal character in path name on Unix-like OS, but illegal on 
Windows.
*/
   @VisibleForTesting
   static String createNormalizedInternedPathname(String dir1, String dir2, 
String fname) {
 String pathname = dir1 + File.separator + dir2 + File.separator + fname;
 Matcher m = MULTIPLE_SEPARATORS.matcher(pathname);
-pathname = m.replaceAll("/");
+pathname = m.replaceAll(Matcher.quoteReplacement(File.separator));
 // A single trailing slash needs to be taken care of separately
-if (pathname.length() > 1 && pathname.endsWith("/")) {
+if (pathname.length() > 1 && pathname.charAt(pathname.length() - 1) == 
File.separatorChar) {
   pathname = pathname.substring(0, pathname.length() - 1);
 }
 return pathname.intern();
diff --git 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
index 09b3143..6515b6c 100644
--- 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
+++ 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
@@ -25,6 +25,7 @@ import java.nio.charset.StandardCharsets;
 
 import com.fasterxml.jackson.databind.ObjectMapper;
 import com.google.common.io.CharStreams;
+import org.apache.commons.lang3.SystemUtils;
 import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo;
 import org.apache.spark.network.util.MapConfigProvider;
 import org.apache.spark.network.util.TransportConf;
@@ -146,12 +147,19 @@ public class ExternalShuffleBlockResolverSuite {
 
   @Test
   public void testNormalizeAndInternPathname() {
-assertPathsMatch("/foo", "bar", "baz", "/foo/bar/baz");
-assertPathsMatch("//foo/", "bar/", "//baz", "/foo/bar/baz");
-assertPathsMatch("foo", "bar", "baz///", "foo/bar/baz");
-

[spark] branch master updated (3f7780d -> 7fda184)

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3f7780d  [SPARK-32136][SQL] NormalizeFloatingNumbers should work on 
null struct
 add 7fda184  [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils

No new revisions were added by this update.

Summary of changes:
 .../spark/network/shuffle/ExecutorDiskUtils.java   | 18 +++---
 .../shuffle/ExternalShuffleBlockResolverSuite.java | 22 +++---
 2 files changed, 30 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b6bb158  [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils
b6bb158 is described below

commit b6bb158d3c1191aac6f4d27d84d7c791c5063b24
Author: pancheng <379377...@qq.com>
AuthorDate: Thu Jul 2 19:21:11 2020 +0900

[SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils

### What changes were proposed in this pull request?
Correct file seprate use in 
`ExecutorDiskUtils.createNormalizedInternedPathname` on Windows

### Why are the changes needed?
`ExternalShuffleBlockResolverSuite` failed on Windows, see detail at:
https://issues.apache.org/jira/browse/SPARK-32121

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
The existed test suite.

Closes #28940 from pan3793/SPARK-32121.

Lead-authored-by: pancheng <379377...@qq.com>
Co-authored-by: chengpan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 7fda184f0fc39613fb68e912c189c54b93c638e6)
Signed-off-by: HyukjinKwon 
---
 .../spark/network/shuffle/ExecutorDiskUtils.java   | 18 +++---
 .../shuffle/ExternalShuffleBlockResolverSuite.java | 22 +++---
 2 files changed, 30 insertions(+), 10 deletions(-)

diff --git 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
index 13f6046..6549cac 100644
--- 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
+++ 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
@@ -23,11 +23,19 @@ import java.util.regex.Pattern;
 
 import com.google.common.annotations.VisibleForTesting;
 
+import org.apache.commons.lang3.SystemUtils;
 import org.apache.spark.network.util.JavaUtils;
 
 public class ExecutorDiskUtils {
 
-  private static final Pattern MULTIPLE_SEPARATORS = 
Pattern.compile(File.separator + "{2,}");
+  private static final Pattern MULTIPLE_SEPARATORS;
+  static {
+if (SystemUtils.IS_OS_WINDOWS) {
+  MULTIPLE_SEPARATORS = Pattern.compile("[/]+");
+} else {
+  MULTIPLE_SEPARATORS = Pattern.compile("/{2,}");
+}
+  }
 
   /**
* Hashes a filename into the corresponding local directory, in a manner 
consistent with
@@ -50,14 +58,18 @@ public class ExecutorDiskUtils {
* the internal code in java.io.File would normalize it later, creating a 
new "foo/bar"
* String copy. Unfortunately, we cannot just reuse the normalization code 
that java.io.File
* uses, since it is in the package-private class java.io.FileSystem.
+   *
+   * On Windows, separator "\" is used instead of "/".
+   *
+   * "\\" is a legal character in path name on Unix-like OS, but illegal on 
Windows.
*/
   @VisibleForTesting
   static String createNormalizedInternedPathname(String dir1, String dir2, 
String fname) {
 String pathname = dir1 + File.separator + dir2 + File.separator + fname;
 Matcher m = MULTIPLE_SEPARATORS.matcher(pathname);
-pathname = m.replaceAll("/");
+pathname = m.replaceAll(Matcher.quoteReplacement(File.separator));
 // A single trailing slash needs to be taken care of separately
-if (pathname.length() > 1 && pathname.endsWith("/")) {
+if (pathname.length() > 1 && pathname.charAt(pathname.length() - 1) == 
File.separatorChar) {
   pathname = pathname.substring(0, pathname.length() - 1);
 }
 return pathname.intern();
diff --git 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
index 09b3143..6515b6c 100644
--- 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
+++ 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java
@@ -25,6 +25,7 @@ import java.nio.charset.StandardCharsets;
 
 import com.fasterxml.jackson.databind.ObjectMapper;
 import com.google.common.io.CharStreams;
+import org.apache.commons.lang3.SystemUtils;
 import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo;
 import org.apache.spark.network.util.MapConfigProvider;
 import org.apache.spark.network.util.TransportConf;
@@ -146,12 +147,19 @@ public class ExternalShuffleBlockResolverSuite {
 
   @Test
   public void testNormalizeAndInternPathname() {
-assertPathsMatch("/foo", "bar", "baz", "/foo/bar/baz");
-assertPathsMatch("//foo/", "bar/", "//baz", "/foo/bar/baz");
-assertPathsMatch("foo", "bar", "baz///", "foo/bar/baz");
-

[spark] branch master updated (3f7780d -> 7fda184)

2020-07-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3f7780d  [SPARK-32136][SQL] NormalizeFloatingNumbers should work on 
null struct
 add 7fda184  [SPARK-32121][SHUFFLE] Support Windows OS in ExecutorDiskUtils

No new revisions were added by this update.

Summary of changes:
 .../spark/network/shuffle/ExecutorDiskUtils.java   | 18 +++---
 .../shuffle/ExternalShuffleBlockResolverSuite.java | 22 +++---
 2 files changed, 30 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

67 matches

Mail list logo