[GitHub] [spark-website] viirya commented on a change in pull request #289: Release 3.0.1

2020-09-08 Thread GitBox


viirya commented on a change in pull request #289:
URL: https://github.com/apache/spark-website/pull/289#discussion_r485342723



##
File path: site/developer-tools.html
##
@@ -505,8 +505,8 @@ Solving a binary incompatibility
 
 For the problem described above, we might add the following:
 
-// [SPARK-zz][CORE] Fix an issue
-ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.SomeClass.this")
+// [SPARK-zz][CORE] Fix an issue
+ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.SomeClass.this")

Review comment:
   I tried it locally. Just run `jekyll build` can reproduce these changes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (de0dc52 -> 794b48c)

2020-09-08 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from de0dc52  [SPARK-32813][SQL] Get default config of ParquetSource 
vectorized reader if no active SparkSession
 add 794b48c  [SPARK-32204][SPARK-32182][DOCS][FOLLOW-UP] Use IPython 
instead of ipython to check if installed in dev/lint-python

No new revisions were added by this update.

Summary of changes:
 dev/lint-python | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (de0dc52 -> 794b48c)

2020-09-08 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from de0dc52  [SPARK-32813][SQL] Get default config of ParquetSource 
vectorized reader if no active SparkSession
 add 794b48c  [SPARK-32204][SPARK-32182][DOCS][FOLLOW-UP] Use IPython 
instead of ipython to check if installed in dev/lint-python

No new revisions were added by this update.

Summary of changes:
 dev/lint-python | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (de0dc52 -> 794b48c)

2020-09-08 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from de0dc52  [SPARK-32813][SQL] Get default config of ParquetSource 
vectorized reader if no active SparkSession
 add 794b48c  [SPARK-32204][SPARK-32182][DOCS][FOLLOW-UP] Use IPython 
instead of ipython to check if installed in dev/lint-python

No new revisions were added by this update.

Summary of changes:
 dev/lint-python | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (de0dc52 -> 794b48c)

2020-09-08 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from de0dc52  [SPARK-32813][SQL] Get default config of ParquetSource 
vectorized reader if no active SparkSession
 add 794b48c  [SPARK-32204][SPARK-32182][DOCS][FOLLOW-UP] Use IPython 
instead of ipython to check if installed in dev/lint-python

No new revisions were added by this update.

Summary of changes:
 dev/lint-python | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (de0dc52 -> 794b48c)

2020-09-08 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from de0dc52  [SPARK-32813][SQL] Get default config of ParquetSource 
vectorized reader if no active SparkSession
 add 794b48c  [SPARK-32204][SPARK-32182][DOCS][FOLLOW-UP] Use IPython 
instead of ipython to check if installed in dev/lint-python

No new revisions were added by this update.

Summary of changes:
 dev/lint-python | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32813][SQL] Get default config of ParquetSource vectorized reader if no active SparkSession

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4c0f9d8  [SPARK-32813][SQL] Get default config of ParquetSource 
vectorized reader if no active SparkSession
4c0f9d8 is described below

commit 4c0f9d8b44f63a3d1faaeece8b1d6b47c3bfe75f
Author: Liang-Chi Hsieh 
AuthorDate: Wed Sep 9 12:23:05 2020 +0900

[SPARK-32813][SQL] Get default config of ParquetSource vectorized reader if 
no active SparkSession

### What changes were proposed in this pull request?

If no active SparkSession is available, let 
`FileSourceScanExec.needsUnsafeRowConversion` look at default SQL config of 
ParquetSource vectorized reader instead of failing the query execution.

### Why are the changes needed?

Fix a bug that if no active SparkSession is available, file-based data 
source scan for Parquet Source will throw exception.

### Does this PR introduce _any_ user-facing change?

Yes, this change fixes the bug.

### How was this patch tested?

Unit test.

Closes #29667 from viirya/SPARK-32813.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: HyukjinKwon 
(cherry picked from commit de0dc52a842bf4374c1ae4f9546dd95b3f35c4f1)
Signed-off-by: HyukjinKwon 
---
 .../spark/sql/execution/DataSourceScanExec.scala   |  2 +-
 .../spark/sql/execution/SQLExecutionSuite.scala| 40 +-
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
index 447e0a6..0fcb0dd 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
@@ -175,7 +175,7 @@ case class FileSourceScanExec(
 
   private lazy val needsUnsafeRowConversion: Boolean = {
 if (relation.fileFormat.isInstanceOf[ParquetSource]) {
-  
SparkSession.getActiveSession.get.sessionState.conf.parquetVectorizedReaderEnabled
+  sqlContext.conf.parquetVectorizedReaderEnabled
 } else {
   false
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
index 8bf7fe6..81e6920 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
@@ -17,11 +17,17 @@
 
 package org.apache.spark.sql.execution
 
+import java.util.concurrent.Executors
+
 import scala.collection.parallel.immutable.ParRange
+import scala.concurrent.{ExecutionContext, Future}
+import scala.concurrent.duration._
 
 import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite}
 import org.apache.spark.scheduler.{SparkListener, SparkListenerJobStart}
-import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.types._
+import org.apache.spark.util.ThreadUtils
 
 class SQLExecutionSuite extends SparkFunSuite {
 
@@ -119,6 +125,38 @@ class SQLExecutionSuite extends SparkFunSuite {
 
 spark.stop()
   }
+
+  test("SPARK-32813: Table scan should work in different thread") {
+val executor1 = Executors.newSingleThreadExecutor()
+val executor2 = Executors.newSingleThreadExecutor()
+var session: SparkSession = null
+SparkSession.cleanupAnyExistingSession()
+
+withTempDir { tempDir =>
+  try {
+val tablePath = tempDir.toString + "/table"
+val df = ThreadUtils.awaitResult(Future {
+  session = 
SparkSession.builder().appName("test").master("local[*]").getOrCreate()
+
+  session.createDataFrame(
+session.sparkContext.parallelize(Row(Array(1, 2, 3)) :: Nil),
+StructType(Seq(
+  StructField("a", ArrayType(IntegerType, containsNull = false), 
nullable = false
+.write.parquet(tablePath)
+
+  session.read.parquet(tablePath)
+}(ExecutionContext.fromExecutorService(executor1)), 1.minute)
+
+ThreadUtils.awaitResult(Future {
+  assert(df.rdd.collect()(0) === Row(Seq(1, 2, 3)))
+}(ExecutionContext.fromExecutorService(executor2)), 1.minute)
+  } finally {
+executor1.shutdown()
+executor2.shutdown()
+session.stop()
+  }
+}
+  }
 }
 
 object SQLExecutionSuite {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32813][SQL] Get default config of ParquetSource vectorized reader if no active SparkSession

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4c0f9d8  [SPARK-32813][SQL] Get default config of ParquetSource 
vectorized reader if no active SparkSession
4c0f9d8 is described below

commit 4c0f9d8b44f63a3d1faaeece8b1d6b47c3bfe75f
Author: Liang-Chi Hsieh 
AuthorDate: Wed Sep 9 12:23:05 2020 +0900

[SPARK-32813][SQL] Get default config of ParquetSource vectorized reader if 
no active SparkSession

### What changes were proposed in this pull request?

If no active SparkSession is available, let 
`FileSourceScanExec.needsUnsafeRowConversion` look at default SQL config of 
ParquetSource vectorized reader instead of failing the query execution.

### Why are the changes needed?

Fix a bug that if no active SparkSession is available, file-based data 
source scan for Parquet Source will throw exception.

### Does this PR introduce _any_ user-facing change?

Yes, this change fixes the bug.

### How was this patch tested?

Unit test.

Closes #29667 from viirya/SPARK-32813.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: HyukjinKwon 
(cherry picked from commit de0dc52a842bf4374c1ae4f9546dd95b3f35c4f1)
Signed-off-by: HyukjinKwon 
---
 .../spark/sql/execution/DataSourceScanExec.scala   |  2 +-
 .../spark/sql/execution/SQLExecutionSuite.scala| 40 +-
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
index 447e0a6..0fcb0dd 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
@@ -175,7 +175,7 @@ case class FileSourceScanExec(
 
   private lazy val needsUnsafeRowConversion: Boolean = {
 if (relation.fileFormat.isInstanceOf[ParquetSource]) {
-  
SparkSession.getActiveSession.get.sessionState.conf.parquetVectorizedReaderEnabled
+  sqlContext.conf.parquetVectorizedReaderEnabled
 } else {
   false
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
index 8bf7fe6..81e6920 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
@@ -17,11 +17,17 @@
 
 package org.apache.spark.sql.execution
 
+import java.util.concurrent.Executors
+
 import scala.collection.parallel.immutable.ParRange
+import scala.concurrent.{ExecutionContext, Future}
+import scala.concurrent.duration._
 
 import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite}
 import org.apache.spark.scheduler.{SparkListener, SparkListenerJobStart}
-import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.types._
+import org.apache.spark.util.ThreadUtils
 
 class SQLExecutionSuite extends SparkFunSuite {
 
@@ -119,6 +125,38 @@ class SQLExecutionSuite extends SparkFunSuite {
 
 spark.stop()
   }
+
+  test("SPARK-32813: Table scan should work in different thread") {
+val executor1 = Executors.newSingleThreadExecutor()
+val executor2 = Executors.newSingleThreadExecutor()
+var session: SparkSession = null
+SparkSession.cleanupAnyExistingSession()
+
+withTempDir { tempDir =>
+  try {
+val tablePath = tempDir.toString + "/table"
+val df = ThreadUtils.awaitResult(Future {
+  session = 
SparkSession.builder().appName("test").master("local[*]").getOrCreate()
+
+  session.createDataFrame(
+session.sparkContext.parallelize(Row(Array(1, 2, 3)) :: Nil),
+StructType(Seq(
+  StructField("a", ArrayType(IntegerType, containsNull = false), 
nullable = false
+.write.parquet(tablePath)
+
+  session.read.parquet(tablePath)
+}(ExecutionContext.fromExecutorService(executor1)), 1.minute)
+
+ThreadUtils.awaitResult(Future {
+  assert(df.rdd.collect()(0) === Row(Seq(1, 2, 3)))
+}(ExecutionContext.fromExecutorService(executor2)), 1.minute)
+  } finally {
+executor1.shutdown()
+executor2.shutdown()
+session.stop()
+  }
+}
+  }
 }
 
 object SQLExecutionSuite {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (514bf56 -> de0dc52)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 514bf56  [SPARK-32823][WEB UI] Fix the master ui resources reporting
 add de0dc52  [SPARK-32813][SQL] Get default config of ParquetSource 
vectorized reader if no active SparkSession

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/DataSourceScanExec.scala   |  2 +-
 .../spark/sql/execution/SQLExecutionSuite.scala| 40 +-
 2 files changed, 40 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32813][SQL] Get default config of ParquetSource vectorized reader if no active SparkSession

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4c0f9d8  [SPARK-32813][SQL] Get default config of ParquetSource 
vectorized reader if no active SparkSession
4c0f9d8 is described below

commit 4c0f9d8b44f63a3d1faaeece8b1d6b47c3bfe75f
Author: Liang-Chi Hsieh 
AuthorDate: Wed Sep 9 12:23:05 2020 +0900

[SPARK-32813][SQL] Get default config of ParquetSource vectorized reader if 
no active SparkSession

### What changes were proposed in this pull request?

If no active SparkSession is available, let 
`FileSourceScanExec.needsUnsafeRowConversion` look at default SQL config of 
ParquetSource vectorized reader instead of failing the query execution.

### Why are the changes needed?

Fix a bug that if no active SparkSession is available, file-based data 
source scan for Parquet Source will throw exception.

### Does this PR introduce _any_ user-facing change?

Yes, this change fixes the bug.

### How was this patch tested?

Unit test.

Closes #29667 from viirya/SPARK-32813.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: HyukjinKwon 
(cherry picked from commit de0dc52a842bf4374c1ae4f9546dd95b3f35c4f1)
Signed-off-by: HyukjinKwon 
---
 .../spark/sql/execution/DataSourceScanExec.scala   |  2 +-
 .../spark/sql/execution/SQLExecutionSuite.scala| 40 +-
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
index 447e0a6..0fcb0dd 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
@@ -175,7 +175,7 @@ case class FileSourceScanExec(
 
   private lazy val needsUnsafeRowConversion: Boolean = {
 if (relation.fileFormat.isInstanceOf[ParquetSource]) {
-  
SparkSession.getActiveSession.get.sessionState.conf.parquetVectorizedReaderEnabled
+  sqlContext.conf.parquetVectorizedReaderEnabled
 } else {
   false
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
index 8bf7fe6..81e6920 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
@@ -17,11 +17,17 @@
 
 package org.apache.spark.sql.execution
 
+import java.util.concurrent.Executors
+
 import scala.collection.parallel.immutable.ParRange
+import scala.concurrent.{ExecutionContext, Future}
+import scala.concurrent.duration._
 
 import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite}
 import org.apache.spark.scheduler.{SparkListener, SparkListenerJobStart}
-import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.types._
+import org.apache.spark.util.ThreadUtils
 
 class SQLExecutionSuite extends SparkFunSuite {
 
@@ -119,6 +125,38 @@ class SQLExecutionSuite extends SparkFunSuite {
 
 spark.stop()
   }
+
+  test("SPARK-32813: Table scan should work in different thread") {
+val executor1 = Executors.newSingleThreadExecutor()
+val executor2 = Executors.newSingleThreadExecutor()
+var session: SparkSession = null
+SparkSession.cleanupAnyExistingSession()
+
+withTempDir { tempDir =>
+  try {
+val tablePath = tempDir.toString + "/table"
+val df = ThreadUtils.awaitResult(Future {
+  session = 
SparkSession.builder().appName("test").master("local[*]").getOrCreate()
+
+  session.createDataFrame(
+session.sparkContext.parallelize(Row(Array(1, 2, 3)) :: Nil),
+StructType(Seq(
+  StructField("a", ArrayType(IntegerType, containsNull = false), 
nullable = false
+.write.parquet(tablePath)
+
+  session.read.parquet(tablePath)
+}(ExecutionContext.fromExecutorService(executor1)), 1.minute)
+
+ThreadUtils.awaitResult(Future {
+  assert(df.rdd.collect()(0) === Row(Seq(1, 2, 3)))
+}(ExecutionContext.fromExecutorService(executor2)), 1.minute)
+  } finally {
+executor1.shutdown()
+executor2.shutdown()
+session.stop()
+  }
+}
+  }
 }
 
 object SQLExecutionSuite {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (514bf56 -> de0dc52)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 514bf56  [SPARK-32823][WEB UI] Fix the master ui resources reporting
 add de0dc52  [SPARK-32813][SQL] Get default config of ParquetSource 
vectorized reader if no active SparkSession

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/DataSourceScanExec.scala   |  2 +-
 .../spark/sql/execution/SQLExecutionSuite.scala| 40 +-
 2 files changed, 40 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32813][SQL] Get default config of ParquetSource vectorized reader if no active SparkSession

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4c0f9d8  [SPARK-32813][SQL] Get default config of ParquetSource 
vectorized reader if no active SparkSession
4c0f9d8 is described below

commit 4c0f9d8b44f63a3d1faaeece8b1d6b47c3bfe75f
Author: Liang-Chi Hsieh 
AuthorDate: Wed Sep 9 12:23:05 2020 +0900

[SPARK-32813][SQL] Get default config of ParquetSource vectorized reader if 
no active SparkSession

### What changes were proposed in this pull request?

If no active SparkSession is available, let 
`FileSourceScanExec.needsUnsafeRowConversion` look at default SQL config of 
ParquetSource vectorized reader instead of failing the query execution.

### Why are the changes needed?

Fix a bug that if no active SparkSession is available, file-based data 
source scan for Parquet Source will throw exception.

### Does this PR introduce _any_ user-facing change?

Yes, this change fixes the bug.

### How was this patch tested?

Unit test.

Closes #29667 from viirya/SPARK-32813.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: HyukjinKwon 
(cherry picked from commit de0dc52a842bf4374c1ae4f9546dd95b3f35c4f1)
Signed-off-by: HyukjinKwon 
---
 .../spark/sql/execution/DataSourceScanExec.scala   |  2 +-
 .../spark/sql/execution/SQLExecutionSuite.scala| 40 +-
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
index 447e0a6..0fcb0dd 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
@@ -175,7 +175,7 @@ case class FileSourceScanExec(
 
   private lazy val needsUnsafeRowConversion: Boolean = {
 if (relation.fileFormat.isInstanceOf[ParquetSource]) {
-  
SparkSession.getActiveSession.get.sessionState.conf.parquetVectorizedReaderEnabled
+  sqlContext.conf.parquetVectorizedReaderEnabled
 } else {
   false
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
index 8bf7fe6..81e6920 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
@@ -17,11 +17,17 @@
 
 package org.apache.spark.sql.execution
 
+import java.util.concurrent.Executors
+
 import scala.collection.parallel.immutable.ParRange
+import scala.concurrent.{ExecutionContext, Future}
+import scala.concurrent.duration._
 
 import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite}
 import org.apache.spark.scheduler.{SparkListener, SparkListenerJobStart}
-import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.types._
+import org.apache.spark.util.ThreadUtils
 
 class SQLExecutionSuite extends SparkFunSuite {
 
@@ -119,6 +125,38 @@ class SQLExecutionSuite extends SparkFunSuite {
 
 spark.stop()
   }
+
+  test("SPARK-32813: Table scan should work in different thread") {
+val executor1 = Executors.newSingleThreadExecutor()
+val executor2 = Executors.newSingleThreadExecutor()
+var session: SparkSession = null
+SparkSession.cleanupAnyExistingSession()
+
+withTempDir { tempDir =>
+  try {
+val tablePath = tempDir.toString + "/table"
+val df = ThreadUtils.awaitResult(Future {
+  session = 
SparkSession.builder().appName("test").master("local[*]").getOrCreate()
+
+  session.createDataFrame(
+session.sparkContext.parallelize(Row(Array(1, 2, 3)) :: Nil),
+StructType(Seq(
+  StructField("a", ArrayType(IntegerType, containsNull = false), 
nullable = false
+.write.parquet(tablePath)
+
+  session.read.parquet(tablePath)
+}(ExecutionContext.fromExecutorService(executor1)), 1.minute)
+
+ThreadUtils.awaitResult(Future {
+  assert(df.rdd.collect()(0) === Row(Seq(1, 2, 3)))
+}(ExecutionContext.fromExecutorService(executor2)), 1.minute)
+  } finally {
+executor1.shutdown()
+executor2.shutdown()
+session.stop()
+  }
+}
+  }
 }
 
 object SQLExecutionSuite {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (514bf56 -> de0dc52)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 514bf56  [SPARK-32823][WEB UI] Fix the master ui resources reporting
 add de0dc52  [SPARK-32813][SQL] Get default config of ParquetSource 
vectorized reader if no active SparkSession

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/DataSourceScanExec.scala   |  2 +-
 .../spark/sql/execution/SQLExecutionSuite.scala| 40 +-
 2 files changed, 40 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32813][SQL] Get default config of ParquetSource vectorized reader if no active SparkSession

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4c0f9d8  [SPARK-32813][SQL] Get default config of ParquetSource 
vectorized reader if no active SparkSession
4c0f9d8 is described below

commit 4c0f9d8b44f63a3d1faaeece8b1d6b47c3bfe75f
Author: Liang-Chi Hsieh 
AuthorDate: Wed Sep 9 12:23:05 2020 +0900

[SPARK-32813][SQL] Get default config of ParquetSource vectorized reader if 
no active SparkSession

### What changes were proposed in this pull request?

If no active SparkSession is available, let 
`FileSourceScanExec.needsUnsafeRowConversion` look at default SQL config of 
ParquetSource vectorized reader instead of failing the query execution.

### Why are the changes needed?

Fix a bug that if no active SparkSession is available, file-based data 
source scan for Parquet Source will throw exception.

### Does this PR introduce _any_ user-facing change?

Yes, this change fixes the bug.

### How was this patch tested?

Unit test.

Closes #29667 from viirya/SPARK-32813.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: HyukjinKwon 
(cherry picked from commit de0dc52a842bf4374c1ae4f9546dd95b3f35c4f1)
Signed-off-by: HyukjinKwon 
---
 .../spark/sql/execution/DataSourceScanExec.scala   |  2 +-
 .../spark/sql/execution/SQLExecutionSuite.scala| 40 +-
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
index 447e0a6..0fcb0dd 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
@@ -175,7 +175,7 @@ case class FileSourceScanExec(
 
   private lazy val needsUnsafeRowConversion: Boolean = {
 if (relation.fileFormat.isInstanceOf[ParquetSource]) {
-  
SparkSession.getActiveSession.get.sessionState.conf.parquetVectorizedReaderEnabled
+  sqlContext.conf.parquetVectorizedReaderEnabled
 } else {
   false
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
index 8bf7fe6..81e6920 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala
@@ -17,11 +17,17 @@
 
 package org.apache.spark.sql.execution
 
+import java.util.concurrent.Executors
+
 import scala.collection.parallel.immutable.ParRange
+import scala.concurrent.{ExecutionContext, Future}
+import scala.concurrent.duration._
 
 import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite}
 import org.apache.spark.scheduler.{SparkListener, SparkListenerJobStart}
-import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.types._
+import org.apache.spark.util.ThreadUtils
 
 class SQLExecutionSuite extends SparkFunSuite {
 
@@ -119,6 +125,38 @@ class SQLExecutionSuite extends SparkFunSuite {
 
 spark.stop()
   }
+
+  test("SPARK-32813: Table scan should work in different thread") {
+val executor1 = Executors.newSingleThreadExecutor()
+val executor2 = Executors.newSingleThreadExecutor()
+var session: SparkSession = null
+SparkSession.cleanupAnyExistingSession()
+
+withTempDir { tempDir =>
+  try {
+val tablePath = tempDir.toString + "/table"
+val df = ThreadUtils.awaitResult(Future {
+  session = 
SparkSession.builder().appName("test").master("local[*]").getOrCreate()
+
+  session.createDataFrame(
+session.sparkContext.parallelize(Row(Array(1, 2, 3)) :: Nil),
+StructType(Seq(
+  StructField("a", ArrayType(IntegerType, containsNull = false), 
nullable = false
+.write.parquet(tablePath)
+
+  session.read.parquet(tablePath)
+}(ExecutionContext.fromExecutorService(executor1)), 1.minute)
+
+ThreadUtils.awaitResult(Future {
+  assert(df.rdd.collect()(0) === Row(Seq(1, 2, 3)))
+}(ExecutionContext.fromExecutorService(executor2)), 1.minute)
+  } finally {
+executor1.shutdown()
+executor2.shutdown()
+session.stop()
+  }
+}
+  }
 }
 
 object SQLExecutionSuite {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (514bf56 -> de0dc52)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 514bf56  [SPARK-32823][WEB UI] Fix the master ui resources reporting
 add de0dc52  [SPARK-32813][SQL] Get default config of ParquetSource 
vectorized reader if no active SparkSession

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/DataSourceScanExec.scala   |  2 +-
 .../spark/sql/execution/SQLExecutionSuite.scala| 40 +-
 2 files changed, 40 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (514bf56 -> de0dc52)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 514bf56  [SPARK-32823][WEB UI] Fix the master ui resources reporting
 add de0dc52  [SPARK-32813][SQL] Get default config of ParquetSource 
vectorized reader if no active SparkSession

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/DataSourceScanExec.scala   |  2 +-
 .../spark/sql/execution/SQLExecutionSuite.scala| 40 +-
 2 files changed, 40 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] viirya commented on a change in pull request #289: Release 3.0.1

2020-09-08 Thread GitBox


viirya commented on a change in pull request #289:
URL: https://github.com/apache/spark-website/pull/289#discussion_r485312315



##
File path: site/developer-tools.html
##
@@ -505,8 +505,8 @@ Solving a binary incompatibility
 
 For the problem described above, we might add the following:
 
-// [SPARK-zz][CORE] Fix an issue
-ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.SomeClass.this")
+// [SPARK-zz][CORE] Fix an issue
+ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.SomeClass.this")

Review comment:
   oh ok. then it should be fine, i don't see change in developer-tools 
related to this, maybe related to jekyll.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] zhengruifeng commented on a change in pull request #289: Release 3.0.1

2020-09-08 Thread GitBox


zhengruifeng commented on a change in pull request #289:
URL: https://github.com/apache/spark-website/pull/289#discussion_r485309360



##
File path: site/developer-tools.html
##
@@ -505,8 +505,8 @@ Solving a binary incompatibility
 
 For the problem described above, we might add the following:
 
-// [SPARK-zz][CORE] Fix an issue
-ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.SomeClass.this")
+// [SPARK-zz][CORE] Fix an issue
+ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.SomeClass.this")

Review comment:
   @viirya those changes in the second 
commit(3b416ecb027c9262a28ad778d50634d6bd582f81) were made by running `jekyll 
build`  in dir `spark-website`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] viirya commented on a change in pull request #289: Release 3.0.1

2020-09-08 Thread GitBox


viirya commented on a change in pull request #289:
URL: https://github.com/apache/spark-website/pull/289#discussion_r485302593



##
File path: site/developer-tools.html
##
@@ -505,8 +505,8 @@ Solving a binary incompatibility
 
 For the problem described above, we might add the following:
 
-// [SPARK-zz][CORE] Fix an issue
-ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.SomeClass.this")
+// [SPARK-zz][CORE] Fix an issue
+ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.SomeClass.this")

Review comment:
   btw, do we have `class="py"`? I don't see `py` in the css in current 
site, or I missed it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] viirya commented on a change in pull request #289: Release 3.0.1

2020-09-08 Thread GitBox


viirya commented on a change in pull request #289:
URL: https://github.com/apache/spark-website/pull/289#discussion_r485301988



##
File path: site/developer-tools.html
##
@@ -505,8 +505,8 @@ Solving a binary incompatibility
 
 For the problem described above, we might add the following:
 
-// [SPARK-zz][CORE] Fix an issue
-ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.SomeClass.this")
+// [SPARK-zz][CORE] Fix an issue
+ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.SomeClass.this")

Review comment:
   why `nc` -> `nv`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32823][WEB UI] Fix the master ui resources reporting

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 86b9dd9  [SPARK-32823][WEB UI] Fix the master ui resources reporting
86b9dd9 is described below

commit 86b9dd9b414cab9489f00ef7acec5c3264fa312d
Author: Thomas Graves 
AuthorDate: Wed Sep 9 10:33:21 2020 +0900

[SPARK-32823][WEB UI] Fix the master ui resources reporting

### What changes were proposed in this pull request?

Fixes the master UI for properly summing the resources total across 
multiple workers.
field:
Resources in use: 0 / 8 gpu

The bug here is that it was creating MutableResourceInfo and then reducing 
using the + operator.  the + operator in MutableResourceInfo simple adds the 
address from one to the addresses of the other.  But its using a HashSet so if 
the addresses are the same then you lose the correct amount.  ie worker1 has 
gpu addresses 0,1,2,3 and worker2 has addresses 0,1,2,3 then you only see 4 
total GPUs when there are 8.

In this case we don't really need to create the MutableResourceInfo at all 
because we just want the sums for used and total so just remove the use of it.  
The other uses of it are per Worker so those should be ok.

### Why are the changes needed?

fix UI

### Does this PR introduce _any_ user-facing change?

UI

### How was this patch tested?

tested manually on standalone cluster with multiple workers and multiple 
GPUs and multiple fpgas

Closes #29683 from tgravescs/SPARK-32823.

Lead-authored-by: Thomas Graves 
Co-authored-by: Thomas Graves 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 514bf563a7fa1f470b2ab088c0838317500a9aab)
Signed-off-by: HyukjinKwon 
---
 .../org/apache/spark/deploy/StandaloneResourceUtils.scala  | 10 +-
 .../scala/org/apache/spark/deploy/master/ui/MasterPage.scala   | 10 --
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala 
b/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala
index e08709e..c7c31a8 100644
--- a/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala
@@ -149,11 +149,11 @@ private[spark] object StandaloneResourceUtils extends 
Logging {
 
   // used for UI
   def formatResourcesUsed(
-  resourcesTotal: Map[String, ResourceInformation],
-  resourcesUsed: Map[String, ResourceInformation]): String = {
-resourcesTotal.map { case (rName, rInfo) =>
-  val used = resourcesUsed(rName).addresses.length
-  val total = rInfo.addresses.length
+  resourcesTotal: Map[String, Int],
+  resourcesUsed: Map[String, Int]): String = {
+resourcesTotal.map { case (rName, totalSize) =>
+  val used = resourcesUsed(rName)
+  val total = totalSize
   s"$used / $total $rName"
 }.mkString(", ")
   }
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
index f64b449..fcbeba9 100644
--- a/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
@@ -76,19 +76,17 @@ private[ui] class MasterPage(parent: MasterWebUI) extends 
WebUIPage("") {
 
   private def formatMasterResourcesInUse(aliveWorkers: Array[WorkerInfo]): 
String = {
 val totalInfo = aliveWorkers.map(_.resourcesInfo)
-  .map(resources => toMutable(resources))
   .flatMap(_.toIterator)
   .groupBy(_._1) // group by resource name
   .map { case (rName, rInfoArr) =>
-rName -> rInfoArr.map(_._2).reduce(_ + _)
-  }.map { case (k, v) => (k, v.toResourceInformation) }
+  rName -> rInfoArr.map(_._2.addresses.size).sum
+}
 val usedInfo = aliveWorkers.map(_.resourcesInfoUsed)
-  .map (resources => toMutable(resources))
   .flatMap(_.toIterator)
   .groupBy(_._1) // group by resource name
   .map { case (rName, rInfoArr) =>
-  rName -> rInfoArr.map(_._2).reduce(_ + _)
-}.map { case (k, v) => (k, v.toResourceInformation) }
+  rName -> rInfoArr.map(_._2.addresses.size).sum
+}
 formatResourcesUsed(totalInfo, usedInfo)
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (adc8d68 -> 514bf56)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from adc8d68  [SPARK-32810][SQL][TESTS][FOLLOWUP] Check path globbing in 
JSON/CSV datasources v1 and v2
 add 514bf56  [SPARK-32823][WEB UI] Fix the master ui resources reporting

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/deploy/StandaloneResourceUtils.scala  | 10 +-
 .../scala/org/apache/spark/deploy/master/ui/MasterPage.scala   | 10 --
 2 files changed, 9 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32823][WEB UI] Fix the master ui resources reporting

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 86b9dd9  [SPARK-32823][WEB UI] Fix the master ui resources reporting
86b9dd9 is described below

commit 86b9dd9b414cab9489f00ef7acec5c3264fa312d
Author: Thomas Graves 
AuthorDate: Wed Sep 9 10:33:21 2020 +0900

[SPARK-32823][WEB UI] Fix the master ui resources reporting

### What changes were proposed in this pull request?

Fixes the master UI for properly summing the resources total across 
multiple workers.
field:
Resources in use: 0 / 8 gpu

The bug here is that it was creating MutableResourceInfo and then reducing 
using the + operator.  the + operator in MutableResourceInfo simple adds the 
address from one to the addresses of the other.  But its using a HashSet so if 
the addresses are the same then you lose the correct amount.  ie worker1 has 
gpu addresses 0,1,2,3 and worker2 has addresses 0,1,2,3 then you only see 4 
total GPUs when there are 8.

In this case we don't really need to create the MutableResourceInfo at all 
because we just want the sums for used and total so just remove the use of it.  
The other uses of it are per Worker so those should be ok.

### Why are the changes needed?

fix UI

### Does this PR introduce _any_ user-facing change?

UI

### How was this patch tested?

tested manually on standalone cluster with multiple workers and multiple 
GPUs and multiple fpgas

Closes #29683 from tgravescs/SPARK-32823.

Lead-authored-by: Thomas Graves 
Co-authored-by: Thomas Graves 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 514bf563a7fa1f470b2ab088c0838317500a9aab)
Signed-off-by: HyukjinKwon 
---
 .../org/apache/spark/deploy/StandaloneResourceUtils.scala  | 10 +-
 .../scala/org/apache/spark/deploy/master/ui/MasterPage.scala   | 10 --
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala 
b/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala
index e08709e..c7c31a8 100644
--- a/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala
@@ -149,11 +149,11 @@ private[spark] object StandaloneResourceUtils extends 
Logging {
 
   // used for UI
   def formatResourcesUsed(
-  resourcesTotal: Map[String, ResourceInformation],
-  resourcesUsed: Map[String, ResourceInformation]): String = {
-resourcesTotal.map { case (rName, rInfo) =>
-  val used = resourcesUsed(rName).addresses.length
-  val total = rInfo.addresses.length
+  resourcesTotal: Map[String, Int],
+  resourcesUsed: Map[String, Int]): String = {
+resourcesTotal.map { case (rName, totalSize) =>
+  val used = resourcesUsed(rName)
+  val total = totalSize
   s"$used / $total $rName"
 }.mkString(", ")
   }
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
index f64b449..fcbeba9 100644
--- a/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
@@ -76,19 +76,17 @@ private[ui] class MasterPage(parent: MasterWebUI) extends 
WebUIPage("") {
 
   private def formatMasterResourcesInUse(aliveWorkers: Array[WorkerInfo]): 
String = {
 val totalInfo = aliveWorkers.map(_.resourcesInfo)
-  .map(resources => toMutable(resources))
   .flatMap(_.toIterator)
   .groupBy(_._1) // group by resource name
   .map { case (rName, rInfoArr) =>
-rName -> rInfoArr.map(_._2).reduce(_ + _)
-  }.map { case (k, v) => (k, v.toResourceInformation) }
+  rName -> rInfoArr.map(_._2.addresses.size).sum
+}
 val usedInfo = aliveWorkers.map(_.resourcesInfoUsed)
-  .map (resources => toMutable(resources))
   .flatMap(_.toIterator)
   .groupBy(_._1) // group by resource name
   .map { case (rName, rInfoArr) =>
-  rName -> rInfoArr.map(_._2).reduce(_ + _)
-}.map { case (k, v) => (k, v.toResourceInformation) }
+  rName -> rInfoArr.map(_._2.addresses.size).sum
+}
 formatResourcesUsed(totalInfo, usedInfo)
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (adc8d68 -> 514bf56)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from adc8d68  [SPARK-32810][SQL][TESTS][FOLLOWUP] Check path globbing in 
JSON/CSV datasources v1 and v2
 add 514bf56  [SPARK-32823][WEB UI] Fix the master ui resources reporting

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/deploy/StandaloneResourceUtils.scala  | 10 +-
 .../scala/org/apache/spark/deploy/master/ui/MasterPage.scala   | 10 --
 2 files changed, 9 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (e8634d8 -> adc8d68)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e8634d8  [SPARK-32824][CORE] Improve the error message when the user 
forgets the .amount in a resource config
 add adc8d68  [SPARK-32810][SQL][TESTS][FOLLOWUP] Check path globbing in 
JSON/CSV datasources v1 and v2

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/datasources/csv/CSVSuite.scala   | 13 
 .../sql/execution/datasources/json/JsonSuite.scala | 13 
 .../sql/test/DataFrameReaderWriterSuite.scala  | 23 --
 3 files changed, 26 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32824][CORE] Improve the error message when the user forgets the .amount in a resource config

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new e86d90b  [SPARK-32824][CORE] Improve the error message when the user 
forgets the .amount in a resource config
e86d90b is described below

commit e86d90b21d4b3d6658b3cb6dd30daafb32b0c1bd
Author: Thomas Graves 
AuthorDate: Wed Sep 9 10:28:40 2020 +0900

[SPARK-32824][CORE] Improve the error message when the user forgets the 
.amount in a resource config

### What changes were proposed in this pull request?

If the user forgets to specify .amount on a resource config like 
spark.executor.resource.gpu, the error message thrown is very confusing:

```
ERROR SparkContext: Error initializing 
SparkContext.java.lang.StringIndexOutOfBoundsException: String index out of 
range:
-1 at java.lang.String.substring(String.java:1967) at
```

This makes it so we have a readable error thrown

### Why are the changes needed?

confusing error for users
### Does this PR introduce _any_ user-facing change?

just error message

### How was this patch tested?

Tested manually on standalone cluster

Closes #29685 from tgravescs/SPARK-32824.

Authored-by: Thomas Graves 
Signed-off-by: HyukjinKwon 
(cherry picked from commit e8634d8f6f8548852a284a32c1b7da24bedd8ff7)
Signed-off-by: HyukjinKwon 
---
 core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala 
b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
index 994b363..16fe897 100644
--- a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
+++ b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
@@ -148,7 +148,12 @@ private[spark] object ResourceUtils extends Logging {
 
   def listResourceIds(sparkConf: SparkConf, componentName: String): 
Seq[ResourceID] = {
 sparkConf.getAllWithPrefix(s"$componentName.$RESOURCE_PREFIX.").map { case 
(key, _) =>
-  key.substring(0, key.indexOf('.'))
+  val index = key.indexOf('.')
+  if (index < 0) {
+throw new SparkException(s"You must specify an amount config for 
resource: $key " +
+  s"config: $componentName.$RESOURCE_PREFIX.$key")
+  }
+  key.substring(0, index)
 }.distinct.map(name => new ResourceID(componentName, name))
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (96ff87dc -> e8634d8)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 96ff87dc [SPARK-32753][SQL][FOLLOWUP] Fix indentation and clean up 
view in test
 add e8634d8  [SPARK-32824][CORE] Improve the error message when the user 
forgets the .amount in a resource config

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32823][WEB UI] Fix the master ui resources reporting

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 86b9dd9  [SPARK-32823][WEB UI] Fix the master ui resources reporting
86b9dd9 is described below

commit 86b9dd9b414cab9489f00ef7acec5c3264fa312d
Author: Thomas Graves 
AuthorDate: Wed Sep 9 10:33:21 2020 +0900

[SPARK-32823][WEB UI] Fix the master ui resources reporting

### What changes were proposed in this pull request?

Fixes the master UI for properly summing the resources total across 
multiple workers.
field:
Resources in use: 0 / 8 gpu

The bug here is that it was creating MutableResourceInfo and then reducing 
using the + operator.  the + operator in MutableResourceInfo simple adds the 
address from one to the addresses of the other.  But its using a HashSet so if 
the addresses are the same then you lose the correct amount.  ie worker1 has 
gpu addresses 0,1,2,3 and worker2 has addresses 0,1,2,3 then you only see 4 
total GPUs when there are 8.

In this case we don't really need to create the MutableResourceInfo at all 
because we just want the sums for used and total so just remove the use of it.  
The other uses of it are per Worker so those should be ok.

### Why are the changes needed?

fix UI

### Does this PR introduce _any_ user-facing change?

UI

### How was this patch tested?

tested manually on standalone cluster with multiple workers and multiple 
GPUs and multiple fpgas

Closes #29683 from tgravescs/SPARK-32823.

Lead-authored-by: Thomas Graves 
Co-authored-by: Thomas Graves 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 514bf563a7fa1f470b2ab088c0838317500a9aab)
Signed-off-by: HyukjinKwon 
---
 .../org/apache/spark/deploy/StandaloneResourceUtils.scala  | 10 +-
 .../scala/org/apache/spark/deploy/master/ui/MasterPage.scala   | 10 --
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala 
b/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala
index e08709e..c7c31a8 100644
--- a/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala
@@ -149,11 +149,11 @@ private[spark] object StandaloneResourceUtils extends 
Logging {
 
   // used for UI
   def formatResourcesUsed(
-  resourcesTotal: Map[String, ResourceInformation],
-  resourcesUsed: Map[String, ResourceInformation]): String = {
-resourcesTotal.map { case (rName, rInfo) =>
-  val used = resourcesUsed(rName).addresses.length
-  val total = rInfo.addresses.length
+  resourcesTotal: Map[String, Int],
+  resourcesUsed: Map[String, Int]): String = {
+resourcesTotal.map { case (rName, totalSize) =>
+  val used = resourcesUsed(rName)
+  val total = totalSize
   s"$used / $total $rName"
 }.mkString(", ")
   }
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
index f64b449..fcbeba9 100644
--- a/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
@@ -76,19 +76,17 @@ private[ui] class MasterPage(parent: MasterWebUI) extends 
WebUIPage("") {
 
   private def formatMasterResourcesInUse(aliveWorkers: Array[WorkerInfo]): 
String = {
 val totalInfo = aliveWorkers.map(_.resourcesInfo)
-  .map(resources => toMutable(resources))
   .flatMap(_.toIterator)
   .groupBy(_._1) // group by resource name
   .map { case (rName, rInfoArr) =>
-rName -> rInfoArr.map(_._2).reduce(_ + _)
-  }.map { case (k, v) => (k, v.toResourceInformation) }
+  rName -> rInfoArr.map(_._2.addresses.size).sum
+}
 val usedInfo = aliveWorkers.map(_.resourcesInfoUsed)
-  .map (resources => toMutable(resources))
   .flatMap(_.toIterator)
   .groupBy(_._1) // group by resource name
   .map { case (rName, rInfoArr) =>
-  rName -> rInfoArr.map(_._2).reduce(_ + _)
-}.map { case (k, v) => (k, v.toResourceInformation) }
+  rName -> rInfoArr.map(_._2.addresses.size).sum
+}
 formatResourcesUsed(totalInfo, usedInfo)
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (adc8d68 -> 514bf56)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from adc8d68  [SPARK-32810][SQL][TESTS][FOLLOWUP] Check path globbing in 
JSON/CSV datasources v1 and v2
 add 514bf56  [SPARK-32823][WEB UI] Fix the master ui resources reporting

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/deploy/StandaloneResourceUtils.scala  | 10 +-
 .../scala/org/apache/spark/deploy/master/ui/MasterPage.scala   | 10 --
 2 files changed, 9 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (e8634d8 -> adc8d68)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e8634d8  [SPARK-32824][CORE] Improve the error message when the user 
forgets the .amount in a resource config
 add adc8d68  [SPARK-32810][SQL][TESTS][FOLLOWUP] Check path globbing in 
JSON/CSV datasources v1 and v2

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/datasources/csv/CSVSuite.scala   | 13 
 .../sql/execution/datasources/json/JsonSuite.scala | 13 
 .../sql/test/DataFrameReaderWriterSuite.scala  | 23 --
 3 files changed, 26 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32824][CORE] Improve the error message when the user forgets the .amount in a resource config

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new e86d90b  [SPARK-32824][CORE] Improve the error message when the user 
forgets the .amount in a resource config
e86d90b is described below

commit e86d90b21d4b3d6658b3cb6dd30daafb32b0c1bd
Author: Thomas Graves 
AuthorDate: Wed Sep 9 10:28:40 2020 +0900

[SPARK-32824][CORE] Improve the error message when the user forgets the 
.amount in a resource config

### What changes were proposed in this pull request?

If the user forgets to specify .amount on a resource config like 
spark.executor.resource.gpu, the error message thrown is very confusing:

```
ERROR SparkContext: Error initializing 
SparkContext.java.lang.StringIndexOutOfBoundsException: String index out of 
range:
-1 at java.lang.String.substring(String.java:1967) at
```

This makes it so we have a readable error thrown

### Why are the changes needed?

confusing error for users
### Does this PR introduce _any_ user-facing change?

just error message

### How was this patch tested?

Tested manually on standalone cluster

Closes #29685 from tgravescs/SPARK-32824.

Authored-by: Thomas Graves 
Signed-off-by: HyukjinKwon 
(cherry picked from commit e8634d8f6f8548852a284a32c1b7da24bedd8ff7)
Signed-off-by: HyukjinKwon 
---
 core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala 
b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
index 994b363..16fe897 100644
--- a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
+++ b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
@@ -148,7 +148,12 @@ private[spark] object ResourceUtils extends Logging {
 
   def listResourceIds(sparkConf: SparkConf, componentName: String): 
Seq[ResourceID] = {
 sparkConf.getAllWithPrefix(s"$componentName.$RESOURCE_PREFIX.").map { case 
(key, _) =>
-  key.substring(0, key.indexOf('.'))
+  val index = key.indexOf('.')
+  if (index < 0) {
+throw new SparkException(s"You must specify an amount config for 
resource: $key " +
+  s"config: $componentName.$RESOURCE_PREFIX.$key")
+  }
+  key.substring(0, index)
 }.distinct.map(name => new ResourceID(componentName, name))
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (96ff87dc -> e8634d8)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 96ff87dc [SPARK-32753][SQL][FOLLOWUP] Fix indentation and clean up 
view in test
 add e8634d8  [SPARK-32824][CORE] Improve the error message when the user 
forgets the .amount in a resource config

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32823][WEB UI] Fix the master ui resources reporting

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 86b9dd9  [SPARK-32823][WEB UI] Fix the master ui resources reporting
86b9dd9 is described below

commit 86b9dd9b414cab9489f00ef7acec5c3264fa312d
Author: Thomas Graves 
AuthorDate: Wed Sep 9 10:33:21 2020 +0900

[SPARK-32823][WEB UI] Fix the master ui resources reporting

### What changes were proposed in this pull request?

Fixes the master UI for properly summing the resources total across 
multiple workers.
field:
Resources in use: 0 / 8 gpu

The bug here is that it was creating MutableResourceInfo and then reducing 
using the + operator.  the + operator in MutableResourceInfo simple adds the 
address from one to the addresses of the other.  But its using a HashSet so if 
the addresses are the same then you lose the correct amount.  ie worker1 has 
gpu addresses 0,1,2,3 and worker2 has addresses 0,1,2,3 then you only see 4 
total GPUs when there are 8.

In this case we don't really need to create the MutableResourceInfo at all 
because we just want the sums for used and total so just remove the use of it.  
The other uses of it are per Worker so those should be ok.

### Why are the changes needed?

fix UI

### Does this PR introduce _any_ user-facing change?

UI

### How was this patch tested?

tested manually on standalone cluster with multiple workers and multiple 
GPUs and multiple fpgas

Closes #29683 from tgravescs/SPARK-32823.

Lead-authored-by: Thomas Graves 
Co-authored-by: Thomas Graves 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 514bf563a7fa1f470b2ab088c0838317500a9aab)
Signed-off-by: HyukjinKwon 
---
 .../org/apache/spark/deploy/StandaloneResourceUtils.scala  | 10 +-
 .../scala/org/apache/spark/deploy/master/ui/MasterPage.scala   | 10 --
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala 
b/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala
index e08709e..c7c31a8 100644
--- a/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala
@@ -149,11 +149,11 @@ private[spark] object StandaloneResourceUtils extends 
Logging {
 
   // used for UI
   def formatResourcesUsed(
-  resourcesTotal: Map[String, ResourceInformation],
-  resourcesUsed: Map[String, ResourceInformation]): String = {
-resourcesTotal.map { case (rName, rInfo) =>
-  val used = resourcesUsed(rName).addresses.length
-  val total = rInfo.addresses.length
+  resourcesTotal: Map[String, Int],
+  resourcesUsed: Map[String, Int]): String = {
+resourcesTotal.map { case (rName, totalSize) =>
+  val used = resourcesUsed(rName)
+  val total = totalSize
   s"$used / $total $rName"
 }.mkString(", ")
   }
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
index f64b449..fcbeba9 100644
--- a/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
@@ -76,19 +76,17 @@ private[ui] class MasterPage(parent: MasterWebUI) extends 
WebUIPage("") {
 
   private def formatMasterResourcesInUse(aliveWorkers: Array[WorkerInfo]): 
String = {
 val totalInfo = aliveWorkers.map(_.resourcesInfo)
-  .map(resources => toMutable(resources))
   .flatMap(_.toIterator)
   .groupBy(_._1) // group by resource name
   .map { case (rName, rInfoArr) =>
-rName -> rInfoArr.map(_._2).reduce(_ + _)
-  }.map { case (k, v) => (k, v.toResourceInformation) }
+  rName -> rInfoArr.map(_._2.addresses.size).sum
+}
 val usedInfo = aliveWorkers.map(_.resourcesInfoUsed)
-  .map (resources => toMutable(resources))
   .flatMap(_.toIterator)
   .groupBy(_._1) // group by resource name
   .map { case (rName, rInfoArr) =>
-  rName -> rInfoArr.map(_._2).reduce(_ + _)
-}.map { case (k, v) => (k, v.toResourceInformation) }
+  rName -> rInfoArr.map(_._2.addresses.size).sum
+}
 formatResourcesUsed(totalInfo, usedInfo)
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (adc8d68 -> 514bf56)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from adc8d68  [SPARK-32810][SQL][TESTS][FOLLOWUP] Check path globbing in 
JSON/CSV datasources v1 and v2
 add 514bf56  [SPARK-32823][WEB UI] Fix the master ui resources reporting

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/deploy/StandaloneResourceUtils.scala  | 10 +-
 .../scala/org/apache/spark/deploy/master/ui/MasterPage.scala   | 10 --
 2 files changed, 9 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] zhengruifeng commented on a change in pull request #289: Release 3.0.1

2020-09-08 Thread GitBox


zhengruifeng commented on a change in pull request #289:
URL: https://github.com/apache/spark-website/pull/289#discussion_r485289490



##
File path: releases/_posts/2020-09-08-spark-release-3-0-1.md
##
@@ -0,0 +1,52 @@
+---
+layout: post
+title: Spark Release 3.0.1
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+  _edit_last: '4'
+  _wpas_done_all: '1'
+---
+
+Spark 3.0.1 is a maintenance release containing stability fixes. This release 
is based on the branch-3.0 maintenance branch of Spark. We strongly recommend 
all 3.0 users to upgrade to this stable release.
+
+### Notable changes
+  - [[SPARK-26905]](https://issues.apache.org/jira/browse/SPARK-26905): 
Revisit reserved/non-reserved keywords based on the ANSI SQL standard
+  - [[SPARK-31220]](https://issues.apache.org/jira/browse/SPARK-31220): 
repartition obeys spark.sql.adaptive.coalescePartitions.initialPartitionNum 
when spark.sql.adaptive.enabled
+  - [[SPARK-31703]](https://issues.apache.org/jira/browse/SPARK-31703): 
Changes made by SPARK-26985 break reading parquet files correctly in BigEndian 
architectures (AIX + LinuxPPC64)
+  - [[SPARK-31915]](https://issues.apache.org/jira/browse/SPARK-31915): 
Resolve the grouping column properly per the case sensitivity in grouped and 
cogrouped pandas UDFs
+  - [[SPARK-31923]](https://issues.apache.org/jira/browse/SPARK-31923): Event 
log cannot be generated when some internal accumulators use unexpected types
+  - [[SPARK-31935]](https://issues.apache.org/jira/browse/SPARK-31935): Hadoop 
file system config should be effective in data source options 
+  - [[SPARK-31968]](https://issues.apache.org/jira/browse/SPARK-31968): 
write.partitionBy() creates duplicate subdirectories when user provides 
duplicate columns
+  - [[SPARK-31983]](https://issues.apache.org/jira/browse/SPARK-31983): Tables 
of structured streaming tab show wrong result for duration column
+  - [[SPARK-31990]](https://issues.apache.org/jira/browse/SPARK-31990): 
Streaming's state store compatibility is broken
+  - [[SPARK-32003]](https://issues.apache.org/jira/browse/SPARK-32003): 
Shuffle files for lost executor are not unregistered if fetch failure occurs 
after executor is lost
+  - [[SPARK-32038]](https://issues.apache.org/jira/browse/SPARK-32038): 
Regression in handling NaN values in COUNT(DISTINCT)
+  - [[SPARK-32073]](https://issues.apache.org/jira/browse/SPARK-32073): Drop R 
< 3.5 support
+  - [[SPARK-32092]](https://issues.apache.org/jira/browse/SPARK-32092): 
CrossvalidatorModel does not save all submodels (it saves only 3)
+  - [[SPARK-32136]](https://issues.apache.org/jira/browse/SPARK-32136): Spark 
producing incorrect groupBy results when key is a struct with nullable 
properties
+  - [[SPARK-32148]](https://issues.apache.org/jira/browse/SPARK-32148): LEFT 
JOIN generating non-deterministic and unexpected result (regression in Spark 
3.0)
+  - [[SPARK-32220]](https://issues.apache.org/jira/browse/SPARK-32220): 
Cartesian Product Hint cause data error
+  - [[SPARK-32310]](https://issues.apache.org/jira/browse/SPARK-32310): ML 
params default value parity
+  - [[SPARK-32339]](https://issues.apache.org/jira/browse/SPARK-32339): 
Improve MLlib BLAS native acceleration docs
+  - [[SPARK-32424]](https://issues.apache.org/jira/browse/SPARK-32424): Fix 
silent data change for timestamp parsing if overflow happens
+  - [[SPARK-32451]](https://issues.apache.org/jira/browse/SPARK-32451): 
Support Apache Arrow 1.0.0 in SparkR
+  - [[SPARK-32456]](https://issues.apache.org/jira/browse/SPARK-32456): Check 
the Distinct by assuming it as Aggregate for Structured Streaming
+  - [[SPARK-32608]](https://issues.apache.org/jira/browse/SPARK-32608): Script 
Transform DELIMIT value should be formatted
+  - [[SPARK-32646]](https://issues.apache.org/jira/browse/SPARK-32646): ORC 
predicate pushdown should work with case-insensitive analysis
+  - [[SPARK-32658]](https://issues.apache.org/jira/browse/SPARK-32658): 
Partition length number overflow in PartitionWriterStream
+  - [[SPARK-32676]](https://issues.apache.org/jira/browse/SPARK-32676): Fix 
double caching in KMeans/BiKMeans
+
+
+### Known issues
+  - [[SPARK-31511]](https://issues.apache.org/jira/browse/SPARK-31511): Make 
BytesToBytesMap iterator() thread-safe

Review comment:
   The four known issues here were marked fixed in the tickets. So I add 
this phrase for them all.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (e8634d8 -> adc8d68)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e8634d8  [SPARK-32824][CORE] Improve the error message when the user 
forgets the .amount in a resource config
 add adc8d68  [SPARK-32810][SQL][TESTS][FOLLOWUP] Check path globbing in 
JSON/CSV datasources v1 and v2

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/datasources/csv/CSVSuite.scala   | 13 
 .../sql/execution/datasources/json/JsonSuite.scala | 13 
 .../sql/test/DataFrameReaderWriterSuite.scala  | 23 --
 3 files changed, 26 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32824][CORE] Improve the error message when the user forgets the .amount in a resource config

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new e86d90b  [SPARK-32824][CORE] Improve the error message when the user 
forgets the .amount in a resource config
e86d90b is described below

commit e86d90b21d4b3d6658b3cb6dd30daafb32b0c1bd
Author: Thomas Graves 
AuthorDate: Wed Sep 9 10:28:40 2020 +0900

[SPARK-32824][CORE] Improve the error message when the user forgets the 
.amount in a resource config

### What changes were proposed in this pull request?

If the user forgets to specify .amount on a resource config like 
spark.executor.resource.gpu, the error message thrown is very confusing:

```
ERROR SparkContext: Error initializing 
SparkContext.java.lang.StringIndexOutOfBoundsException: String index out of 
range:
-1 at java.lang.String.substring(String.java:1967) at
```

This makes it so we have a readable error thrown

### Why are the changes needed?

confusing error for users
### Does this PR introduce _any_ user-facing change?

just error message

### How was this patch tested?

Tested manually on standalone cluster

Closes #29685 from tgravescs/SPARK-32824.

Authored-by: Thomas Graves 
Signed-off-by: HyukjinKwon 
(cherry picked from commit e8634d8f6f8548852a284a32c1b7da24bedd8ff7)
Signed-off-by: HyukjinKwon 
---
 core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala 
b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
index 994b363..16fe897 100644
--- a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
+++ b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
@@ -148,7 +148,12 @@ private[spark] object ResourceUtils extends Logging {
 
   def listResourceIds(sparkConf: SparkConf, componentName: String): 
Seq[ResourceID] = {
 sparkConf.getAllWithPrefix(s"$componentName.$RESOURCE_PREFIX.").map { case 
(key, _) =>
-  key.substring(0, key.indexOf('.'))
+  val index = key.indexOf('.')
+  if (index < 0) {
+throw new SparkException(s"You must specify an amount config for 
resource: $key " +
+  s"config: $componentName.$RESOURCE_PREFIX.$key")
+  }
+  key.substring(0, index)
 }.distinct.map(name => new ResourceID(componentName, name))
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (96ff87dc -> e8634d8)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 96ff87dc [SPARK-32753][SQL][FOLLOWUP] Fix indentation and clean up 
view in test
 add e8634d8  [SPARK-32824][CORE] Improve the error message when the user 
forgets the .amount in a resource config

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32823][WEB UI] Fix the master ui resources reporting

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 86b9dd9  [SPARK-32823][WEB UI] Fix the master ui resources reporting
86b9dd9 is described below

commit 86b9dd9b414cab9489f00ef7acec5c3264fa312d
Author: Thomas Graves 
AuthorDate: Wed Sep 9 10:33:21 2020 +0900

[SPARK-32823][WEB UI] Fix the master ui resources reporting

### What changes were proposed in this pull request?

Fixes the master UI for properly summing the resources total across 
multiple workers.
field:
Resources in use: 0 / 8 gpu

The bug here is that it was creating MutableResourceInfo and then reducing 
using the + operator.  the + operator in MutableResourceInfo simple adds the 
address from one to the addresses of the other.  But its using a HashSet so if 
the addresses are the same then you lose the correct amount.  ie worker1 has 
gpu addresses 0,1,2,3 and worker2 has addresses 0,1,2,3 then you only see 4 
total GPUs when there are 8.

In this case we don't really need to create the MutableResourceInfo at all 
because we just want the sums for used and total so just remove the use of it.  
The other uses of it are per Worker so those should be ok.

### Why are the changes needed?

fix UI

### Does this PR introduce _any_ user-facing change?

UI

### How was this patch tested?

tested manually on standalone cluster with multiple workers and multiple 
GPUs and multiple fpgas

Closes #29683 from tgravescs/SPARK-32823.

Lead-authored-by: Thomas Graves 
Co-authored-by: Thomas Graves 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 514bf563a7fa1f470b2ab088c0838317500a9aab)
Signed-off-by: HyukjinKwon 
---
 .../org/apache/spark/deploy/StandaloneResourceUtils.scala  | 10 +-
 .../scala/org/apache/spark/deploy/master/ui/MasterPage.scala   | 10 --
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala 
b/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala
index e08709e..c7c31a8 100644
--- a/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala
@@ -149,11 +149,11 @@ private[spark] object StandaloneResourceUtils extends 
Logging {
 
   // used for UI
   def formatResourcesUsed(
-  resourcesTotal: Map[String, ResourceInformation],
-  resourcesUsed: Map[String, ResourceInformation]): String = {
-resourcesTotal.map { case (rName, rInfo) =>
-  val used = resourcesUsed(rName).addresses.length
-  val total = rInfo.addresses.length
+  resourcesTotal: Map[String, Int],
+  resourcesUsed: Map[String, Int]): String = {
+resourcesTotal.map { case (rName, totalSize) =>
+  val used = resourcesUsed(rName)
+  val total = totalSize
   s"$used / $total $rName"
 }.mkString(", ")
   }
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
index f64b449..fcbeba9 100644
--- a/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
@@ -76,19 +76,17 @@ private[ui] class MasterPage(parent: MasterWebUI) extends 
WebUIPage("") {
 
   private def formatMasterResourcesInUse(aliveWorkers: Array[WorkerInfo]): 
String = {
 val totalInfo = aliveWorkers.map(_.resourcesInfo)
-  .map(resources => toMutable(resources))
   .flatMap(_.toIterator)
   .groupBy(_._1) // group by resource name
   .map { case (rName, rInfoArr) =>
-rName -> rInfoArr.map(_._2).reduce(_ + _)
-  }.map { case (k, v) => (k, v.toResourceInformation) }
+  rName -> rInfoArr.map(_._2.addresses.size).sum
+}
 val usedInfo = aliveWorkers.map(_.resourcesInfoUsed)
-  .map (resources => toMutable(resources))
   .flatMap(_.toIterator)
   .groupBy(_._1) // group by resource name
   .map { case (rName, rInfoArr) =>
-  rName -> rInfoArr.map(_._2).reduce(_ + _)
-}.map { case (k, v) => (k, v.toResourceInformation) }
+  rName -> rInfoArr.map(_._2.addresses.size).sum
+}
 formatResourcesUsed(totalInfo, usedInfo)
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (adc8d68 -> 514bf56)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from adc8d68  [SPARK-32810][SQL][TESTS][FOLLOWUP] Check path globbing in 
JSON/CSV datasources v1 and v2
 add 514bf56  [SPARK-32823][WEB UI] Fix the master ui resources reporting

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/deploy/StandaloneResourceUtils.scala  | 10 +-
 .../scala/org/apache/spark/deploy/master/ui/MasterPage.scala   | 10 --
 2 files changed, 9 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (e8634d8 -> adc8d68)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e8634d8  [SPARK-32824][CORE] Improve the error message when the user 
forgets the .amount in a resource config
 add adc8d68  [SPARK-32810][SQL][TESTS][FOLLOWUP] Check path globbing in 
JSON/CSV datasources v1 and v2

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/datasources/csv/CSVSuite.scala   | 13 
 .../sql/execution/datasources/json/JsonSuite.scala | 13 
 .../sql/test/DataFrameReaderWriterSuite.scala  | 23 --
 3 files changed, 26 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32824][CORE] Improve the error message when the user forgets the .amount in a resource config

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new e86d90b  [SPARK-32824][CORE] Improve the error message when the user 
forgets the .amount in a resource config
e86d90b is described below

commit e86d90b21d4b3d6658b3cb6dd30daafb32b0c1bd
Author: Thomas Graves 
AuthorDate: Wed Sep 9 10:28:40 2020 +0900

[SPARK-32824][CORE] Improve the error message when the user forgets the 
.amount in a resource config

### What changes were proposed in this pull request?

If the user forgets to specify .amount on a resource config like 
spark.executor.resource.gpu, the error message thrown is very confusing:

```
ERROR SparkContext: Error initializing 
SparkContext.java.lang.StringIndexOutOfBoundsException: String index out of 
range:
-1 at java.lang.String.substring(String.java:1967) at
```

This makes it so we have a readable error thrown

### Why are the changes needed?

confusing error for users
### Does this PR introduce _any_ user-facing change?

just error message

### How was this patch tested?

Tested manually on standalone cluster

Closes #29685 from tgravescs/SPARK-32824.

Authored-by: Thomas Graves 
Signed-off-by: HyukjinKwon 
(cherry picked from commit e8634d8f6f8548852a284a32c1b7da24bedd8ff7)
Signed-off-by: HyukjinKwon 
---
 core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala 
b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
index 994b363..16fe897 100644
--- a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
+++ b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
@@ -148,7 +148,12 @@ private[spark] object ResourceUtils extends Logging {
 
   def listResourceIds(sparkConf: SparkConf, componentName: String): 
Seq[ResourceID] = {
 sparkConf.getAllWithPrefix(s"$componentName.$RESOURCE_PREFIX.").map { case 
(key, _) =>
-  key.substring(0, key.indexOf('.'))
+  val index = key.indexOf('.')
+  if (index < 0) {
+throw new SparkException(s"You must specify an amount config for 
resource: $key " +
+  s"config: $componentName.$RESOURCE_PREFIX.$key")
+  }
+  key.substring(0, index)
 }.distinct.map(name => new ResourceID(componentName, name))
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (96ff87dc -> e8634d8)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 96ff87dc [SPARK-32753][SQL][FOLLOWUP] Fix indentation and clean up 
view in test
 add e8634d8  [SPARK-32824][CORE] Improve the error message when the user 
forgets the .amount in a resource config

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (e8634d8 -> adc8d68)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e8634d8  [SPARK-32824][CORE] Improve the error message when the user 
forgets the .amount in a resource config
 add adc8d68  [SPARK-32810][SQL][TESTS][FOLLOWUP] Check path globbing in 
JSON/CSV datasources v1 and v2

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/datasources/csv/CSVSuite.scala   | 13 
 .../sql/execution/datasources/json/JsonSuite.scala | 13 
 .../sql/test/DataFrameReaderWriterSuite.scala  | 23 --
 3 files changed, 26 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32824][CORE] Improve the error message when the user forgets the .amount in a resource config

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new e86d90b  [SPARK-32824][CORE] Improve the error message when the user 
forgets the .amount in a resource config
e86d90b is described below

commit e86d90b21d4b3d6658b3cb6dd30daafb32b0c1bd
Author: Thomas Graves 
AuthorDate: Wed Sep 9 10:28:40 2020 +0900

[SPARK-32824][CORE] Improve the error message when the user forgets the 
.amount in a resource config

### What changes were proposed in this pull request?

If the user forgets to specify .amount on a resource config like 
spark.executor.resource.gpu, the error message thrown is very confusing:

```
ERROR SparkContext: Error initializing 
SparkContext.java.lang.StringIndexOutOfBoundsException: String index out of 
range:
-1 at java.lang.String.substring(String.java:1967) at
```

This makes it so we have a readable error thrown

### Why are the changes needed?

confusing error for users
### Does this PR introduce _any_ user-facing change?

just error message

### How was this patch tested?

Tested manually on standalone cluster

Closes #29685 from tgravescs/SPARK-32824.

Authored-by: Thomas Graves 
Signed-off-by: HyukjinKwon 
(cherry picked from commit e8634d8f6f8548852a284a32c1b7da24bedd8ff7)
Signed-off-by: HyukjinKwon 
---
 core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala 
b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
index 994b363..16fe897 100644
--- a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
+++ b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
@@ -148,7 +148,12 @@ private[spark] object ResourceUtils extends Logging {
 
   def listResourceIds(sparkConf: SparkConf, componentName: String): 
Seq[ResourceID] = {
 sparkConf.getAllWithPrefix(s"$componentName.$RESOURCE_PREFIX.").map { case 
(key, _) =>
-  key.substring(0, key.indexOf('.'))
+  val index = key.indexOf('.')
+  if (index < 0) {
+throw new SparkException(s"You must specify an amount config for 
resource: $key " +
+  s"config: $componentName.$RESOURCE_PREFIX.$key")
+  }
+  key.substring(0, index)
 }.distinct.map(name => new ResourceID(componentName, name))
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (96ff87dc -> e8634d8)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 96ff87dc [SPARK-32753][SQL][FOLLOWUP] Fix indentation and clean up 
view in test
 add e8634d8  [SPARK-32824][CORE] Improve the error message when the user 
forgets the .amount in a resource config

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (aa87b0a -> 96ff87dc)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from aa87b0a  [SPARK-32815][ML] Fix LibSVM data source loading error on 
file paths with glob metacharacters
 add 96ff87dc [SPARK-32753][SQL][FOLLOWUP] Fix indentation and clean up 
view in test

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveQueryExecSuite.scala| 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (aa87b0a -> 96ff87dc)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from aa87b0a  [SPARK-32815][ML] Fix LibSVM data source loading error on 
file paths with glob metacharacters
 add 96ff87dc [SPARK-32753][SQL][FOLLOWUP] Fix indentation and clean up 
view in test

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveQueryExecSuite.scala| 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (aa87b0a -> 96ff87dc)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from aa87b0a  [SPARK-32815][ML] Fix LibSVM data source loading error on 
file paths with glob metacharacters
 add 96ff87dc [SPARK-32753][SQL][FOLLOWUP] Fix indentation and clean up 
view in test

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveQueryExecSuite.scala| 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (aa87b0a -> 96ff87dc)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from aa87b0a  [SPARK-32815][ML] Fix LibSVM data source loading error on 
file paths with glob metacharacters
 add 96ff87dc [SPARK-32753][SQL][FOLLOWUP] Fix indentation and clean up 
view in test

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveQueryExecSuite.scala| 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (aa87b0a -> 96ff87dc)

2020-09-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from aa87b0a  [SPARK-32815][ML] Fix LibSVM data source loading error on 
file paths with glob metacharacters
 add 96ff87dc [SPARK-32753][SQL][FOLLOWUP] Fix indentation and clean up 
view in test

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveQueryExecSuite.scala| 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32638][SQL][3.0] Corrects references when adding aliases in WidenSetOperationTypes

2020-09-08 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 3f20f14  [SPARK-32638][SQL][3.0] Corrects references when adding 
aliases in WidenSetOperationTypes
3f20f14 is described below

commit 3f20f1413ccb1f685b96de96e5e705db6a70e058
Author: Wenchen Fan 
AuthorDate: Wed Sep 9 08:21:59 2020 +0900

[SPARK-32638][SQL][3.0] Corrects references when adding aliases in 
WidenSetOperationTypes

### What changes were proposed in this pull request?

This PR intends to fix a bug where references can be missing when adding 
aliases to widen data types in `WidenSetOperationTypes`. For example,
```
CREATE OR REPLACE TEMPORARY VIEW t3 AS VALUES (decimal(1)) tbl(v);
SELECT t.v FROM (
  SELECT v FROM t3
  UNION ALL
  SELECT v + v AS v FROM t3
) t;

org.apache.spark.sql.AnalysisException: Resolved attribute(s) v#1 missing 
from v#3 in operator !Project [v#1]. Attribute(s) with the same name appear in 
the operation: v. Please check if the right attribute(s) are used.;;
!Project [v#1]  <-- the reference got missing
+- SubqueryAlias t
   +- Union
  :- Project [cast(v#1 as decimal(11,0)) AS v#3]
  :  +- Project [v#1]
  : +- SubqueryAlias t3
  :+- SubqueryAlias tbl
  :   +- LocalRelation [v#1]
  +- Project [v#2]
 +- Project [CheckOverflow((promote_precision(cast(v#1 as 
decimal(11,0))) + promote_precision(cast(v#1 as decimal(11,0, 
DecimalType(11,0), true) AS v#2]
+- SubqueryAlias t3
   +- SubqueryAlias tbl
  +- LocalRelation [v#1]
```
In the case, `WidenSetOperationTypes` added the alias `cast(v#1 as 
decimal(11,0)) AS v#3`, then the reference in the top `Project` got missing. 
This PR correct the reference (`exprId` and widen `dataType`) after adding 
aliases in the rule.

This backport for 3.0 comes from #29485 and #29643

### Why are the changes needed?

bugfixes

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added unit tests

Closes #29680 from maropu/SPARK-32638-BRANCH3.0.

Lead-authored-by: Wenchen Fan 
Co-authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala | 109 ++---
 .../spark/sql/catalyst/analysis/TypeCoercion.scala |  59 +++
 .../spark/sql/catalyst/plans/QueryPlan.scala   |  85 
 .../catalyst/plans/logical/AnalysisHelper.scala|  15 ++-
 .../sql/catalyst/analysis/TypeCoercionSuite.scala  |  18 +++-
 .../src/test/resources/sql-tests/inputs/except.sql |  19 
 .../resources/sql-tests/inputs/intersect-all.sql   |  15 +++
 .../src/test/resources/sql-tests/inputs/union.sql  |  14 +++
 .../resources/sql-tests/results/except.sql.out |  58 ++-
 .../sql-tests/results/intersect-all.sql.out|  42 +++-
 .../test/resources/sql-tests/results/union.sql.out |  43 +++-
 11 files changed, 348 insertions(+), 129 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 89454c2..729d316 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -1239,108 +1239,13 @@ class Analyzer(
   if (conflictPlans.isEmpty) {
 right
   } else {
-rewritePlan(right, conflictPlans.toMap)._1
-  }
-}
-
-private def rewritePlan(plan: LogicalPlan, conflictPlanMap: 
Map[LogicalPlan, LogicalPlan])
-  : (LogicalPlan, Seq[(Attribute, Attribute)]) = {
-  if (conflictPlanMap.contains(plan)) {
-// If the plan is the one that conflict the with left one, we'd
-// just replace it with the new plan and collect the rewrite
-// attributes for the parent node.
-val newRelation = conflictPlanMap(plan)
-newRelation -> plan.output.zip(newRelation.output)
-  } else {
-val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]()
-val newPlan = plan.mapChildren { child =>
-  // If not, we'd rewrite child plan recursively until we find the
-  // conflict node or reach the leaf node.
-  val (newChild, childAttrMapping) = rewritePlan(child, 
conflictPlanMap)
-  attrMapping ++= childAttrMapping.filter { case (oldAttr, _) =>
-// `attrMapping` is not only used to replace the attributes of the 
current `plan`,
-// but also to be propagated to the parent plans of the current 
`

[spark] branch branch-3.0 updated: [SPARK-32638][SQL][3.0] Corrects references when adding aliases in WidenSetOperationTypes

2020-09-08 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 3f20f14  [SPARK-32638][SQL][3.0] Corrects references when adding 
aliases in WidenSetOperationTypes
3f20f14 is described below

commit 3f20f1413ccb1f685b96de96e5e705db6a70e058
Author: Wenchen Fan 
AuthorDate: Wed Sep 9 08:21:59 2020 +0900

[SPARK-32638][SQL][3.0] Corrects references when adding aliases in 
WidenSetOperationTypes

### What changes were proposed in this pull request?

This PR intends to fix a bug where references can be missing when adding 
aliases to widen data types in `WidenSetOperationTypes`. For example,
```
CREATE OR REPLACE TEMPORARY VIEW t3 AS VALUES (decimal(1)) tbl(v);
SELECT t.v FROM (
  SELECT v FROM t3
  UNION ALL
  SELECT v + v AS v FROM t3
) t;

org.apache.spark.sql.AnalysisException: Resolved attribute(s) v#1 missing 
from v#3 in operator !Project [v#1]. Attribute(s) with the same name appear in 
the operation: v. Please check if the right attribute(s) are used.;;
!Project [v#1]  <-- the reference got missing
+- SubqueryAlias t
   +- Union
  :- Project [cast(v#1 as decimal(11,0)) AS v#3]
  :  +- Project [v#1]
  : +- SubqueryAlias t3
  :+- SubqueryAlias tbl
  :   +- LocalRelation [v#1]
  +- Project [v#2]
 +- Project [CheckOverflow((promote_precision(cast(v#1 as 
decimal(11,0))) + promote_precision(cast(v#1 as decimal(11,0, 
DecimalType(11,0), true) AS v#2]
+- SubqueryAlias t3
   +- SubqueryAlias tbl
  +- LocalRelation [v#1]
```
In the case, `WidenSetOperationTypes` added the alias `cast(v#1 as 
decimal(11,0)) AS v#3`, then the reference in the top `Project` got missing. 
This PR correct the reference (`exprId` and widen `dataType`) after adding 
aliases in the rule.

This backport for 3.0 comes from #29485 and #29643

### Why are the changes needed?

bugfixes

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added unit tests

Closes #29680 from maropu/SPARK-32638-BRANCH3.0.

Lead-authored-by: Wenchen Fan 
Co-authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala | 109 ++---
 .../spark/sql/catalyst/analysis/TypeCoercion.scala |  59 +++
 .../spark/sql/catalyst/plans/QueryPlan.scala   |  85 
 .../catalyst/plans/logical/AnalysisHelper.scala|  15 ++-
 .../sql/catalyst/analysis/TypeCoercionSuite.scala  |  18 +++-
 .../src/test/resources/sql-tests/inputs/except.sql |  19 
 .../resources/sql-tests/inputs/intersect-all.sql   |  15 +++
 .../src/test/resources/sql-tests/inputs/union.sql  |  14 +++
 .../resources/sql-tests/results/except.sql.out |  58 ++-
 .../sql-tests/results/intersect-all.sql.out|  42 +++-
 .../test/resources/sql-tests/results/union.sql.out |  43 +++-
 11 files changed, 348 insertions(+), 129 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 89454c2..729d316 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -1239,108 +1239,13 @@ class Analyzer(
   if (conflictPlans.isEmpty) {
 right
   } else {
-rewritePlan(right, conflictPlans.toMap)._1
-  }
-}
-
-private def rewritePlan(plan: LogicalPlan, conflictPlanMap: 
Map[LogicalPlan, LogicalPlan])
-  : (LogicalPlan, Seq[(Attribute, Attribute)]) = {
-  if (conflictPlanMap.contains(plan)) {
-// If the plan is the one that conflict the with left one, we'd
-// just replace it with the new plan and collect the rewrite
-// attributes for the parent node.
-val newRelation = conflictPlanMap(plan)
-newRelation -> plan.output.zip(newRelation.output)
-  } else {
-val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]()
-val newPlan = plan.mapChildren { child =>
-  // If not, we'd rewrite child plan recursively until we find the
-  // conflict node or reach the leaf node.
-  val (newChild, childAttrMapping) = rewritePlan(child, 
conflictPlanMap)
-  attrMapping ++= childAttrMapping.filter { case (oldAttr, _) =>
-// `attrMapping` is not only used to replace the attributes of the 
current `plan`,
-// but also to be propagated to the parent plans of the current 
`

[spark] branch branch-3.0 updated: [SPARK-32638][SQL][3.0] Corrects references when adding aliases in WidenSetOperationTypes

2020-09-08 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 3f20f14  [SPARK-32638][SQL][3.0] Corrects references when adding 
aliases in WidenSetOperationTypes
3f20f14 is described below

commit 3f20f1413ccb1f685b96de96e5e705db6a70e058
Author: Wenchen Fan 
AuthorDate: Wed Sep 9 08:21:59 2020 +0900

[SPARK-32638][SQL][3.0] Corrects references when adding aliases in 
WidenSetOperationTypes

### What changes were proposed in this pull request?

This PR intends to fix a bug where references can be missing when adding 
aliases to widen data types in `WidenSetOperationTypes`. For example,
```
CREATE OR REPLACE TEMPORARY VIEW t3 AS VALUES (decimal(1)) tbl(v);
SELECT t.v FROM (
  SELECT v FROM t3
  UNION ALL
  SELECT v + v AS v FROM t3
) t;

org.apache.spark.sql.AnalysisException: Resolved attribute(s) v#1 missing 
from v#3 in operator !Project [v#1]. Attribute(s) with the same name appear in 
the operation: v. Please check if the right attribute(s) are used.;;
!Project [v#1]  <-- the reference got missing
+- SubqueryAlias t
   +- Union
  :- Project [cast(v#1 as decimal(11,0)) AS v#3]
  :  +- Project [v#1]
  : +- SubqueryAlias t3
  :+- SubqueryAlias tbl
  :   +- LocalRelation [v#1]
  +- Project [v#2]
 +- Project [CheckOverflow((promote_precision(cast(v#1 as 
decimal(11,0))) + promote_precision(cast(v#1 as decimal(11,0, 
DecimalType(11,0), true) AS v#2]
+- SubqueryAlias t3
   +- SubqueryAlias tbl
  +- LocalRelation [v#1]
```
In the case, `WidenSetOperationTypes` added the alias `cast(v#1 as 
decimal(11,0)) AS v#3`, then the reference in the top `Project` got missing. 
This PR correct the reference (`exprId` and widen `dataType`) after adding 
aliases in the rule.

This backport for 3.0 comes from #29485 and #29643

### Why are the changes needed?

bugfixes

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added unit tests

Closes #29680 from maropu/SPARK-32638-BRANCH3.0.

Lead-authored-by: Wenchen Fan 
Co-authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala | 109 ++---
 .../spark/sql/catalyst/analysis/TypeCoercion.scala |  59 +++
 .../spark/sql/catalyst/plans/QueryPlan.scala   |  85 
 .../catalyst/plans/logical/AnalysisHelper.scala|  15 ++-
 .../sql/catalyst/analysis/TypeCoercionSuite.scala  |  18 +++-
 .../src/test/resources/sql-tests/inputs/except.sql |  19 
 .../resources/sql-tests/inputs/intersect-all.sql   |  15 +++
 .../src/test/resources/sql-tests/inputs/union.sql  |  14 +++
 .../resources/sql-tests/results/except.sql.out |  58 ++-
 .../sql-tests/results/intersect-all.sql.out|  42 +++-
 .../test/resources/sql-tests/results/union.sql.out |  43 +++-
 11 files changed, 348 insertions(+), 129 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 89454c2..729d316 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -1239,108 +1239,13 @@ class Analyzer(
   if (conflictPlans.isEmpty) {
 right
   } else {
-rewritePlan(right, conflictPlans.toMap)._1
-  }
-}
-
-private def rewritePlan(plan: LogicalPlan, conflictPlanMap: 
Map[LogicalPlan, LogicalPlan])
-  : (LogicalPlan, Seq[(Attribute, Attribute)]) = {
-  if (conflictPlanMap.contains(plan)) {
-// If the plan is the one that conflict the with left one, we'd
-// just replace it with the new plan and collect the rewrite
-// attributes for the parent node.
-val newRelation = conflictPlanMap(plan)
-newRelation -> plan.output.zip(newRelation.output)
-  } else {
-val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]()
-val newPlan = plan.mapChildren { child =>
-  // If not, we'd rewrite child plan recursively until we find the
-  // conflict node or reach the leaf node.
-  val (newChild, childAttrMapping) = rewritePlan(child, 
conflictPlanMap)
-  attrMapping ++= childAttrMapping.filter { case (oldAttr, _) =>
-// `attrMapping` is not only used to replace the attributes of the 
current `plan`,
-// but also to be propagated to the parent plans of the current 
`

[spark] branch branch-3.0 updated: [SPARK-32638][SQL][3.0] Corrects references when adding aliases in WidenSetOperationTypes

2020-09-08 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 3f20f14  [SPARK-32638][SQL][3.0] Corrects references when adding 
aliases in WidenSetOperationTypes
3f20f14 is described below

commit 3f20f1413ccb1f685b96de96e5e705db6a70e058
Author: Wenchen Fan 
AuthorDate: Wed Sep 9 08:21:59 2020 +0900

[SPARK-32638][SQL][3.0] Corrects references when adding aliases in 
WidenSetOperationTypes

### What changes were proposed in this pull request?

This PR intends to fix a bug where references can be missing when adding 
aliases to widen data types in `WidenSetOperationTypes`. For example,
```
CREATE OR REPLACE TEMPORARY VIEW t3 AS VALUES (decimal(1)) tbl(v);
SELECT t.v FROM (
  SELECT v FROM t3
  UNION ALL
  SELECT v + v AS v FROM t3
) t;

org.apache.spark.sql.AnalysisException: Resolved attribute(s) v#1 missing 
from v#3 in operator !Project [v#1]. Attribute(s) with the same name appear in 
the operation: v. Please check if the right attribute(s) are used.;;
!Project [v#1]  <-- the reference got missing
+- SubqueryAlias t
   +- Union
  :- Project [cast(v#1 as decimal(11,0)) AS v#3]
  :  +- Project [v#1]
  : +- SubqueryAlias t3
  :+- SubqueryAlias tbl
  :   +- LocalRelation [v#1]
  +- Project [v#2]
 +- Project [CheckOverflow((promote_precision(cast(v#1 as 
decimal(11,0))) + promote_precision(cast(v#1 as decimal(11,0, 
DecimalType(11,0), true) AS v#2]
+- SubqueryAlias t3
   +- SubqueryAlias tbl
  +- LocalRelation [v#1]
```
In the case, `WidenSetOperationTypes` added the alias `cast(v#1 as 
decimal(11,0)) AS v#3`, then the reference in the top `Project` got missing. 
This PR correct the reference (`exprId` and widen `dataType`) after adding 
aliases in the rule.

This backport for 3.0 comes from #29485 and #29643

### Why are the changes needed?

bugfixes

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added unit tests

Closes #29680 from maropu/SPARK-32638-BRANCH3.0.

Lead-authored-by: Wenchen Fan 
Co-authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala | 109 ++---
 .../spark/sql/catalyst/analysis/TypeCoercion.scala |  59 +++
 .../spark/sql/catalyst/plans/QueryPlan.scala   |  85 
 .../catalyst/plans/logical/AnalysisHelper.scala|  15 ++-
 .../sql/catalyst/analysis/TypeCoercionSuite.scala  |  18 +++-
 .../src/test/resources/sql-tests/inputs/except.sql |  19 
 .../resources/sql-tests/inputs/intersect-all.sql   |  15 +++
 .../src/test/resources/sql-tests/inputs/union.sql  |  14 +++
 .../resources/sql-tests/results/except.sql.out |  58 ++-
 .../sql-tests/results/intersect-all.sql.out|  42 +++-
 .../test/resources/sql-tests/results/union.sql.out |  43 +++-
 11 files changed, 348 insertions(+), 129 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 89454c2..729d316 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -1239,108 +1239,13 @@ class Analyzer(
   if (conflictPlans.isEmpty) {
 right
   } else {
-rewritePlan(right, conflictPlans.toMap)._1
-  }
-}
-
-private def rewritePlan(plan: LogicalPlan, conflictPlanMap: 
Map[LogicalPlan, LogicalPlan])
-  : (LogicalPlan, Seq[(Attribute, Attribute)]) = {
-  if (conflictPlanMap.contains(plan)) {
-// If the plan is the one that conflict the with left one, we'd
-// just replace it with the new plan and collect the rewrite
-// attributes for the parent node.
-val newRelation = conflictPlanMap(plan)
-newRelation -> plan.output.zip(newRelation.output)
-  } else {
-val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]()
-val newPlan = plan.mapChildren { child =>
-  // If not, we'd rewrite child plan recursively until we find the
-  // conflict node or reach the leaf node.
-  val (newChild, childAttrMapping) = rewritePlan(child, 
conflictPlanMap)
-  attrMapping ++= childAttrMapping.filter { case (oldAttr, _) =>
-// `attrMapping` is not only used to replace the attributes of the 
current `plan`,
-// but also to be propagated to the parent plans of the current 
`

[spark] branch branch-3.0 updated: [SPARK-32638][SQL][3.0] Corrects references when adding aliases in WidenSetOperationTypes

2020-09-08 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 3f20f14  [SPARK-32638][SQL][3.0] Corrects references when adding 
aliases in WidenSetOperationTypes
3f20f14 is described below

commit 3f20f1413ccb1f685b96de96e5e705db6a70e058
Author: Wenchen Fan 
AuthorDate: Wed Sep 9 08:21:59 2020 +0900

[SPARK-32638][SQL][3.0] Corrects references when adding aliases in 
WidenSetOperationTypes

### What changes were proposed in this pull request?

This PR intends to fix a bug where references can be missing when adding 
aliases to widen data types in `WidenSetOperationTypes`. For example,
```
CREATE OR REPLACE TEMPORARY VIEW t3 AS VALUES (decimal(1)) tbl(v);
SELECT t.v FROM (
  SELECT v FROM t3
  UNION ALL
  SELECT v + v AS v FROM t3
) t;

org.apache.spark.sql.AnalysisException: Resolved attribute(s) v#1 missing 
from v#3 in operator !Project [v#1]. Attribute(s) with the same name appear in 
the operation: v. Please check if the right attribute(s) are used.;;
!Project [v#1]  <-- the reference got missing
+- SubqueryAlias t
   +- Union
  :- Project [cast(v#1 as decimal(11,0)) AS v#3]
  :  +- Project [v#1]
  : +- SubqueryAlias t3
  :+- SubqueryAlias tbl
  :   +- LocalRelation [v#1]
  +- Project [v#2]
 +- Project [CheckOverflow((promote_precision(cast(v#1 as 
decimal(11,0))) + promote_precision(cast(v#1 as decimal(11,0, 
DecimalType(11,0), true) AS v#2]
+- SubqueryAlias t3
   +- SubqueryAlias tbl
  +- LocalRelation [v#1]
```
In the case, `WidenSetOperationTypes` added the alias `cast(v#1 as 
decimal(11,0)) AS v#3`, then the reference in the top `Project` got missing. 
This PR correct the reference (`exprId` and widen `dataType`) after adding 
aliases in the rule.

This backport for 3.0 comes from #29485 and #29643

### Why are the changes needed?

bugfixes

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added unit tests

Closes #29680 from maropu/SPARK-32638-BRANCH3.0.

Lead-authored-by: Wenchen Fan 
Co-authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala | 109 ++---
 .../spark/sql/catalyst/analysis/TypeCoercion.scala |  59 +++
 .../spark/sql/catalyst/plans/QueryPlan.scala   |  85 
 .../catalyst/plans/logical/AnalysisHelper.scala|  15 ++-
 .../sql/catalyst/analysis/TypeCoercionSuite.scala  |  18 +++-
 .../src/test/resources/sql-tests/inputs/except.sql |  19 
 .../resources/sql-tests/inputs/intersect-all.sql   |  15 +++
 .../src/test/resources/sql-tests/inputs/union.sql  |  14 +++
 .../resources/sql-tests/results/except.sql.out |  58 ++-
 .../sql-tests/results/intersect-all.sql.out|  42 +++-
 .../test/resources/sql-tests/results/union.sql.out |  43 +++-
 11 files changed, 348 insertions(+), 129 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 89454c2..729d316 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -1239,108 +1239,13 @@ class Analyzer(
   if (conflictPlans.isEmpty) {
 right
   } else {
-rewritePlan(right, conflictPlans.toMap)._1
-  }
-}
-
-private def rewritePlan(plan: LogicalPlan, conflictPlanMap: 
Map[LogicalPlan, LogicalPlan])
-  : (LogicalPlan, Seq[(Attribute, Attribute)]) = {
-  if (conflictPlanMap.contains(plan)) {
-// If the plan is the one that conflict the with left one, we'd
-// just replace it with the new plan and collect the rewrite
-// attributes for the parent node.
-val newRelation = conflictPlanMap(plan)
-newRelation -> plan.output.zip(newRelation.output)
-  } else {
-val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]()
-val newPlan = plan.mapChildren { child =>
-  // If not, we'd rewrite child plan recursively until we find the
-  // conflict node or reach the leaf node.
-  val (newChild, childAttrMapping) = rewritePlan(child, 
conflictPlanMap)
-  attrMapping ++= childAttrMapping.filter { case (oldAttr, _) =>
-// `attrMapping` is not only used to replace the attributes of the 
current `plan`,
-// but also to be propagated to the parent plans of the current 
`

[spark] branch branch-2.4 updated: [SPARK-32815][ML][2.4] Fix LibSVM data source loading error on file paths with glob metacharacters

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new ef24542  [SPARK-32815][ML][2.4] Fix LibSVM data source loading error 
on file paths with glob metacharacters
ef24542 is described below

commit ef24542721bd1107339c6efef8f14a970e429384
Author: Max Gekk 
AuthorDate: Tue Sep 8 14:17:18 2020 +

[SPARK-32815][ML][2.4] Fix LibSVM data source loading error on file paths 
with glob metacharacters

### What changes were proposed in this pull request?
In the PR, I propose to fix an issue with LibSVM datasource when both of 
the following are true:
* no user specified schema
* some file paths contain escaped glob metacharacters, such as `[``]`, 
`{``}`, `*` etc.

The fix is a backport of https://github.com/apache/spark/pull/29675, and it 
is based on another bug fix for CSV/JSON datasources 
https://github.com/apache/spark/pull/29663.

### Why are the changes needed?
To fix the issue when the follow two queries try to read from paths `[abc]`:
```scala
spark.read.format("libsvm").load("""/tmp/\[abc\].csv""").show
```
but would end up hitting an exception:
```
Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
at 
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$3(DataSource.scala:770)
at 
org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:373)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
```

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Added UT to `LibSVMRelationSuite`.

Closes #29678 from MaxGekk/globbing-paths-when-inferring-schema-ml-2.4.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/ml/source/libsvm/LibSVMRelation.scala |  2 +-
 .../scala/org/apache/spark/mllib/util/MLUtils.scala|  3 ++-
 .../spark/ml/source/libsvm/LibSVMRelationSuite.scala   | 18 ++
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
index 39dcd91..5795812 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
@@ -99,7 +99,7 @@ private[libsvm] class LibSVMFileFormat
 "though the input. If you know the number in advance, please specify 
it via " +
 "'numFeatures' option to avoid the extra scan.")
 
-  val paths = files.map(_.getPath.toUri.toString)
+  val paths = files.map(_.getPath.toString)
   val parsed = MLUtils.parseLibSVMFile(sparkSession, paths)
   MLUtils.computeNumFeatures(parsed)
 }
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
index 14af8b5..c8550cd 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
@@ -110,7 +110,8 @@ object MLUtils extends Logging {
   DataSource.apply(
 sparkSession,
 paths = paths,
-className = classOf[TextFileFormat].getName
+className = classOf[TextFileFormat].getName,
+options = Map(DataSource.GLOB_PATHS_KEY -> "false")
   ).resolveRelation(checkFilesExist = false))
   .select("value")
 
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
index 3eabff4..28c770c 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
@@ -184,4 +184,22 @@ class LibSVMRelationSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   spark.sql("DROP TABLE IF EXISTS libsvmTable")
 }
   }
+
+  test("SPARK-32815: Test LibSVM data source on file paths with glob 
metacharacters") {
+val basePath = Utils.createDirectory(tempDir.getCanonicalPath, "globbing")
+// test libsvm writer / reader without specifying schema
+

[spark] branch branch-3.0 updated (9b39e4b -> 8c0b9cb)

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9b39e4b  [SPARK-32753][SQL][3.0] Only copy tags to node with no tags
 add 8c0b9cb  [SPARK-32815][ML][3.0] Fix LibSVM data source loading error 
on file paths with glob metacharacters

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/source/libsvm/LibSVMRelation.scala  |  2 +-
 .../scala/org/apache/spark/mllib/util/MLUtils.scala  |  3 ++-
 .../spark/ml/source/libsvm/LibSVMRelationSuite.scala | 20 
 3 files changed, 23 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-32815][ML][2.4] Fix LibSVM data source loading error on file paths with glob metacharacters

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new ef24542  [SPARK-32815][ML][2.4] Fix LibSVM data source loading error 
on file paths with glob metacharacters
ef24542 is described below

commit ef24542721bd1107339c6efef8f14a970e429384
Author: Max Gekk 
AuthorDate: Tue Sep 8 14:17:18 2020 +

[SPARK-32815][ML][2.4] Fix LibSVM data source loading error on file paths 
with glob metacharacters

### What changes were proposed in this pull request?
In the PR, I propose to fix an issue with LibSVM datasource when both of 
the following are true:
* no user specified schema
* some file paths contain escaped glob metacharacters, such as `[``]`, 
`{``}`, `*` etc.

The fix is a backport of https://github.com/apache/spark/pull/29675, and it 
is based on another bug fix for CSV/JSON datasources 
https://github.com/apache/spark/pull/29663.

### Why are the changes needed?
To fix the issue when the follow two queries try to read from paths `[abc]`:
```scala
spark.read.format("libsvm").load("""/tmp/\[abc\].csv""").show
```
but would end up hitting an exception:
```
Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
at 
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$3(DataSource.scala:770)
at 
org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:373)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
```

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Added UT to `LibSVMRelationSuite`.

Closes #29678 from MaxGekk/globbing-paths-when-inferring-schema-ml-2.4.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/ml/source/libsvm/LibSVMRelation.scala |  2 +-
 .../scala/org/apache/spark/mllib/util/MLUtils.scala|  3 ++-
 .../spark/ml/source/libsvm/LibSVMRelationSuite.scala   | 18 ++
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
index 39dcd91..5795812 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
@@ -99,7 +99,7 @@ private[libsvm] class LibSVMFileFormat
 "though the input. If you know the number in advance, please specify 
it via " +
 "'numFeatures' option to avoid the extra scan.")
 
-  val paths = files.map(_.getPath.toUri.toString)
+  val paths = files.map(_.getPath.toString)
   val parsed = MLUtils.parseLibSVMFile(sparkSession, paths)
   MLUtils.computeNumFeatures(parsed)
 }
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
index 14af8b5..c8550cd 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
@@ -110,7 +110,8 @@ object MLUtils extends Logging {
   DataSource.apply(
 sparkSession,
 paths = paths,
-className = classOf[TextFileFormat].getName
+className = classOf[TextFileFormat].getName,
+options = Map(DataSource.GLOB_PATHS_KEY -> "false")
   ).resolveRelation(checkFilesExist = false))
   .select("value")
 
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
index 3eabff4..28c770c 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
@@ -184,4 +184,22 @@ class LibSVMRelationSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   spark.sql("DROP TABLE IF EXISTS libsvmTable")
 }
   }
+
+  test("SPARK-32815: Test LibSVM data source on file paths with glob 
metacharacters") {
+val basePath = Utils.createDirectory(tempDir.getCanonicalPath, "globbing")
+// test libsvm writer / reader without specifying schema
+

[spark] branch branch-3.0 updated: [SPARK-32815][ML][3.0] Fix LibSVM data source loading error on file paths with glob metacharacters

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8c0b9cb  [SPARK-32815][ML][3.0] Fix LibSVM data source loading error 
on file paths with glob metacharacters
8c0b9cb is described below

commit 8c0b9cbf68693db22314637a75f28e5aa954aff8
Author: Max Gekk 
AuthorDate: Tue Sep 8 14:16:13 2020 +

[SPARK-32815][ML][3.0] Fix LibSVM data source loading error on file paths 
with glob metacharacters

### What changes were proposed in this pull request?
In the PR, I propose to fix an issue with LibSVM datasource when both of 
the following are true:
* no user specified schema
* some file paths contain escaped glob metacharacters, such as `[``]`, 
`{``}`, `*` etc.

The fix is a backport of https://github.com/apache/spark/pull/29670, and it 
is based on another bug fix for CSV/JSON datasources 
https://github.com/apache/spark/pull/29659.

### Why are the changes needed?
To fix the issue when the follow two queries try to read from paths `[abc]`:
```scala
spark.read.format("libsvm").load("""/tmp/\[abc\].csv""").show
```
but would end up hitting an exception:
```
Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
at 
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$3(DataSource.scala:770)
at 
org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:373)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
```

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Added UT to `LibSVMRelationSuite`.

Closes #29675 from MaxGekk/globbing-paths-when-inferring-schema-ml-3.0.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
---
 .../spark/ml/source/libsvm/LibSVMRelation.scala  |  2 +-
 .../scala/org/apache/spark/mllib/util/MLUtils.scala  |  3 ++-
 .../spark/ml/source/libsvm/LibSVMRelationSuite.scala | 20 
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
index da8f3a24f..11be1d8 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
@@ -100,7 +100,7 @@ private[libsvm] class LibSVMFileFormat
 "though the input. If you know the number in advance, please specify 
it via " +
 "'numFeatures' option to avoid the extra scan.")
 
-  val paths = files.map(_.getPath.toUri.toString)
+  val paths = files.map(_.getPath.toString)
   val parsed = MLUtils.parseLibSVMFile(sparkSession, paths)
   MLUtils.computeNumFeatures(parsed)
 }
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
index 9198334..2411300 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
@@ -110,7 +110,8 @@ object MLUtils extends Logging {
   DataSource.apply(
 sparkSession,
 paths = paths,
-className = classOf[TextFileFormat].getName
+className = classOf[TextFileFormat].getName,
+options = Map(DataSource.GLOB_PATHS_KEY -> "false")
   ).resolveRelation(checkFilesExist = false))
   .select("value")
 
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
index 263ad26..0999892 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
@@ -191,4 +191,24 @@ class LibSVMRelationSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   spark.sql("DROP TABLE IF EXISTS libsvmTable")
 }
   }
+
+  test("SPARK-32815: Test LibSVM data source on file paths with glob 
metacharacters") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  // test libsvm writer / reader without specifying schema
+  val svm

[spark] branch master updated (e7d9a245 -> aa87b0a)

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e7d9a245 [SPARK-32817][SQL] DPP throws error when broadcast side is 
empty
 add aa87b0a  [SPARK-32815][ML] Fix LibSVM data source loading error on 
file paths with glob metacharacters

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/source/libsvm/LibSVMRelation.scala  |  2 +-
 .../scala/org/apache/spark/mllib/util/MLUtils.scala  |  3 ++-
 .../spark/ml/source/libsvm/LibSVMRelationSuite.scala | 20 
 3 files changed, 23 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-32815][ML][2.4] Fix LibSVM data source loading error on file paths with glob metacharacters

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new ef24542  [SPARK-32815][ML][2.4] Fix LibSVM data source loading error 
on file paths with glob metacharacters
ef24542 is described below

commit ef24542721bd1107339c6efef8f14a970e429384
Author: Max Gekk 
AuthorDate: Tue Sep 8 14:17:18 2020 +

[SPARK-32815][ML][2.4] Fix LibSVM data source loading error on file paths 
with glob metacharacters

### What changes were proposed in this pull request?
In the PR, I propose to fix an issue with LibSVM datasource when both of 
the following are true:
* no user specified schema
* some file paths contain escaped glob metacharacters, such as `[``]`, 
`{``}`, `*` etc.

The fix is a backport of https://github.com/apache/spark/pull/29675, and it 
is based on another bug fix for CSV/JSON datasources 
https://github.com/apache/spark/pull/29663.

### Why are the changes needed?
To fix the issue when the follow two queries try to read from paths `[abc]`:
```scala
spark.read.format("libsvm").load("""/tmp/\[abc\].csv""").show
```
but would end up hitting an exception:
```
Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
at 
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$3(DataSource.scala:770)
at 
org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:373)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
```

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Added UT to `LibSVMRelationSuite`.

Closes #29678 from MaxGekk/globbing-paths-when-inferring-schema-ml-2.4.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/ml/source/libsvm/LibSVMRelation.scala |  2 +-
 .../scala/org/apache/spark/mllib/util/MLUtils.scala|  3 ++-
 .../spark/ml/source/libsvm/LibSVMRelationSuite.scala   | 18 ++
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
index 39dcd91..5795812 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
@@ -99,7 +99,7 @@ private[libsvm] class LibSVMFileFormat
 "though the input. If you know the number in advance, please specify 
it via " +
 "'numFeatures' option to avoid the extra scan.")
 
-  val paths = files.map(_.getPath.toUri.toString)
+  val paths = files.map(_.getPath.toString)
   val parsed = MLUtils.parseLibSVMFile(sparkSession, paths)
   MLUtils.computeNumFeatures(parsed)
 }
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
index 14af8b5..c8550cd 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
@@ -110,7 +110,8 @@ object MLUtils extends Logging {
   DataSource.apply(
 sparkSession,
 paths = paths,
-className = classOf[TextFileFormat].getName
+className = classOf[TextFileFormat].getName,
+options = Map(DataSource.GLOB_PATHS_KEY -> "false")
   ).resolveRelation(checkFilesExist = false))
   .select("value")
 
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
index 3eabff4..28c770c 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
@@ -184,4 +184,22 @@ class LibSVMRelationSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   spark.sql("DROP TABLE IF EXISTS libsvmTable")
 }
   }
+
+  test("SPARK-32815: Test LibSVM data source on file paths with glob 
metacharacters") {
+val basePath = Utils.createDirectory(tempDir.getCanonicalPath, "globbing")
+// test libsvm writer / reader without specifying schema
+

[spark] branch branch-3.0 updated: [SPARK-32815][ML][3.0] Fix LibSVM data source loading error on file paths with glob metacharacters

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8c0b9cb  [SPARK-32815][ML][3.0] Fix LibSVM data source loading error 
on file paths with glob metacharacters
8c0b9cb is described below

commit 8c0b9cbf68693db22314637a75f28e5aa954aff8
Author: Max Gekk 
AuthorDate: Tue Sep 8 14:16:13 2020 +

[SPARK-32815][ML][3.0] Fix LibSVM data source loading error on file paths 
with glob metacharacters

### What changes were proposed in this pull request?
In the PR, I propose to fix an issue with LibSVM datasource when both of 
the following are true:
* no user specified schema
* some file paths contain escaped glob metacharacters, such as `[``]`, 
`{``}`, `*` etc.

The fix is a backport of https://github.com/apache/spark/pull/29670, and it 
is based on another bug fix for CSV/JSON datasources 
https://github.com/apache/spark/pull/29659.

### Why are the changes needed?
To fix the issue when the follow two queries try to read from paths `[abc]`:
```scala
spark.read.format("libsvm").load("""/tmp/\[abc\].csv""").show
```
but would end up hitting an exception:
```
Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
at 
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$3(DataSource.scala:770)
at 
org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:373)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
```

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Added UT to `LibSVMRelationSuite`.

Closes #29675 from MaxGekk/globbing-paths-when-inferring-schema-ml-3.0.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
---
 .../spark/ml/source/libsvm/LibSVMRelation.scala  |  2 +-
 .../scala/org/apache/spark/mllib/util/MLUtils.scala  |  3 ++-
 .../spark/ml/source/libsvm/LibSVMRelationSuite.scala | 20 
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
index da8f3a24f..11be1d8 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
@@ -100,7 +100,7 @@ private[libsvm] class LibSVMFileFormat
 "though the input. If you know the number in advance, please specify 
it via " +
 "'numFeatures' option to avoid the extra scan.")
 
-  val paths = files.map(_.getPath.toUri.toString)
+  val paths = files.map(_.getPath.toString)
   val parsed = MLUtils.parseLibSVMFile(sparkSession, paths)
   MLUtils.computeNumFeatures(parsed)
 }
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
index 9198334..2411300 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
@@ -110,7 +110,8 @@ object MLUtils extends Logging {
   DataSource.apply(
 sparkSession,
 paths = paths,
-className = classOf[TextFileFormat].getName
+className = classOf[TextFileFormat].getName,
+options = Map(DataSource.GLOB_PATHS_KEY -> "false")
   ).resolveRelation(checkFilesExist = false))
   .select("value")
 
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
index 263ad26..0999892 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
@@ -191,4 +191,24 @@ class LibSVMRelationSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   spark.sql("DROP TABLE IF EXISTS libsvmTable")
 }
   }
+
+  test("SPARK-32815: Test LibSVM data source on file paths with glob 
metacharacters") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  // test libsvm writer / reader without specifying schema
+  val svm

[spark] branch master updated (e7d9a245 -> aa87b0a)

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e7d9a245 [SPARK-32817][SQL] DPP throws error when broadcast side is 
empty
 add aa87b0a  [SPARK-32815][ML] Fix LibSVM data source loading error on 
file paths with glob metacharacters

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/source/libsvm/LibSVMRelation.scala  |  2 +-
 .../scala/org/apache/spark/mllib/util/MLUtils.scala  |  3 ++-
 .../spark/ml/source/libsvm/LibSVMRelationSuite.scala | 20 
 3 files changed, 23 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-32815][ML][2.4] Fix LibSVM data source loading error on file paths with glob metacharacters

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new ef24542  [SPARK-32815][ML][2.4] Fix LibSVM data source loading error 
on file paths with glob metacharacters
ef24542 is described below

commit ef24542721bd1107339c6efef8f14a970e429384
Author: Max Gekk 
AuthorDate: Tue Sep 8 14:17:18 2020 +

[SPARK-32815][ML][2.4] Fix LibSVM data source loading error on file paths 
with glob metacharacters

### What changes were proposed in this pull request?
In the PR, I propose to fix an issue with LibSVM datasource when both of 
the following are true:
* no user specified schema
* some file paths contain escaped glob metacharacters, such as `[``]`, 
`{``}`, `*` etc.

The fix is a backport of https://github.com/apache/spark/pull/29675, and it 
is based on another bug fix for CSV/JSON datasources 
https://github.com/apache/spark/pull/29663.

### Why are the changes needed?
To fix the issue when the follow two queries try to read from paths `[abc]`:
```scala
spark.read.format("libsvm").load("""/tmp/\[abc\].csv""").show
```
but would end up hitting an exception:
```
Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
at 
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$3(DataSource.scala:770)
at 
org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:373)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
```

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Added UT to `LibSVMRelationSuite`.

Closes #29678 from MaxGekk/globbing-paths-when-inferring-schema-ml-2.4.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/ml/source/libsvm/LibSVMRelation.scala |  2 +-
 .../scala/org/apache/spark/mllib/util/MLUtils.scala|  3 ++-
 .../spark/ml/source/libsvm/LibSVMRelationSuite.scala   | 18 ++
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
index 39dcd91..5795812 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
@@ -99,7 +99,7 @@ private[libsvm] class LibSVMFileFormat
 "though the input. If you know the number in advance, please specify 
it via " +
 "'numFeatures' option to avoid the extra scan.")
 
-  val paths = files.map(_.getPath.toUri.toString)
+  val paths = files.map(_.getPath.toString)
   val parsed = MLUtils.parseLibSVMFile(sparkSession, paths)
   MLUtils.computeNumFeatures(parsed)
 }
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
index 14af8b5..c8550cd 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
@@ -110,7 +110,8 @@ object MLUtils extends Logging {
   DataSource.apply(
 sparkSession,
 paths = paths,
-className = classOf[TextFileFormat].getName
+className = classOf[TextFileFormat].getName,
+options = Map(DataSource.GLOB_PATHS_KEY -> "false")
   ).resolveRelation(checkFilesExist = false))
   .select("value")
 
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
index 3eabff4..28c770c 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
@@ -184,4 +184,22 @@ class LibSVMRelationSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   spark.sql("DROP TABLE IF EXISTS libsvmTable")
 }
   }
+
+  test("SPARK-32815: Test LibSVM data source on file paths with glob 
metacharacters") {
+val basePath = Utils.createDirectory(tempDir.getCanonicalPath, "globbing")
+// test libsvm writer / reader without specifying schema
+

[spark] branch branch-3.0 updated: [SPARK-32815][ML][3.0] Fix LibSVM data source loading error on file paths with glob metacharacters

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8c0b9cb  [SPARK-32815][ML][3.0] Fix LibSVM data source loading error 
on file paths with glob metacharacters
8c0b9cb is described below

commit 8c0b9cbf68693db22314637a75f28e5aa954aff8
Author: Max Gekk 
AuthorDate: Tue Sep 8 14:16:13 2020 +

[SPARK-32815][ML][3.0] Fix LibSVM data source loading error on file paths 
with glob metacharacters

### What changes were proposed in this pull request?
In the PR, I propose to fix an issue with LibSVM datasource when both of 
the following are true:
* no user specified schema
* some file paths contain escaped glob metacharacters, such as `[``]`, 
`{``}`, `*` etc.

The fix is a backport of https://github.com/apache/spark/pull/29670, and it 
is based on another bug fix for CSV/JSON datasources 
https://github.com/apache/spark/pull/29659.

### Why are the changes needed?
To fix the issue when the follow two queries try to read from paths `[abc]`:
```scala
spark.read.format("libsvm").load("""/tmp/\[abc\].csv""").show
```
but would end up hitting an exception:
```
Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
at 
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$3(DataSource.scala:770)
at 
org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:373)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
```

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Added UT to `LibSVMRelationSuite`.

Closes #29675 from MaxGekk/globbing-paths-when-inferring-schema-ml-3.0.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
---
 .../spark/ml/source/libsvm/LibSVMRelation.scala  |  2 +-
 .../scala/org/apache/spark/mllib/util/MLUtils.scala  |  3 ++-
 .../spark/ml/source/libsvm/LibSVMRelationSuite.scala | 20 
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
index da8f3a24f..11be1d8 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
@@ -100,7 +100,7 @@ private[libsvm] class LibSVMFileFormat
 "though the input. If you know the number in advance, please specify 
it via " +
 "'numFeatures' option to avoid the extra scan.")
 
-  val paths = files.map(_.getPath.toUri.toString)
+  val paths = files.map(_.getPath.toString)
   val parsed = MLUtils.parseLibSVMFile(sparkSession, paths)
   MLUtils.computeNumFeatures(parsed)
 }
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
index 9198334..2411300 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
@@ -110,7 +110,8 @@ object MLUtils extends Logging {
   DataSource.apply(
 sparkSession,
 paths = paths,
-className = classOf[TextFileFormat].getName
+className = classOf[TextFileFormat].getName,
+options = Map(DataSource.GLOB_PATHS_KEY -> "false")
   ).resolveRelation(checkFilesExist = false))
   .select("value")
 
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
index 263ad26..0999892 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
@@ -191,4 +191,24 @@ class LibSVMRelationSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   spark.sql("DROP TABLE IF EXISTS libsvmTable")
 }
   }
+
+  test("SPARK-32815: Test LibSVM data source on file paths with glob 
metacharacters") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  // test libsvm writer / reader without specifying schema
+  val svm

[spark] branch master updated (e7d9a245 -> aa87b0a)

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e7d9a245 [SPARK-32817][SQL] DPP throws error when broadcast side is 
empty
 add aa87b0a  [SPARK-32815][ML] Fix LibSVM data source loading error on 
file paths with glob metacharacters

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/source/libsvm/LibSVMRelation.scala  |  2 +-
 .../scala/org/apache/spark/mllib/util/MLUtils.scala  |  3 ++-
 .../spark/ml/source/libsvm/LibSVMRelationSuite.scala | 20 
 3 files changed, 23 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-32815][ML][2.4] Fix LibSVM data source loading error on file paths with glob metacharacters

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new ef24542  [SPARK-32815][ML][2.4] Fix LibSVM data source loading error 
on file paths with glob metacharacters
ef24542 is described below

commit ef24542721bd1107339c6efef8f14a970e429384
Author: Max Gekk 
AuthorDate: Tue Sep 8 14:17:18 2020 +

[SPARK-32815][ML][2.4] Fix LibSVM data source loading error on file paths 
with glob metacharacters

### What changes were proposed in this pull request?
In the PR, I propose to fix an issue with LibSVM datasource when both of 
the following are true:
* no user specified schema
* some file paths contain escaped glob metacharacters, such as `[``]`, 
`{``}`, `*` etc.

The fix is a backport of https://github.com/apache/spark/pull/29675, and it 
is based on another bug fix for CSV/JSON datasources 
https://github.com/apache/spark/pull/29663.

### Why are the changes needed?
To fix the issue when the follow two queries try to read from paths `[abc]`:
```scala
spark.read.format("libsvm").load("""/tmp/\[abc\].csv""").show
```
but would end up hitting an exception:
```
Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
at 
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$3(DataSource.scala:770)
at 
org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:373)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
```

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Added UT to `LibSVMRelationSuite`.

Closes #29678 from MaxGekk/globbing-paths-when-inferring-schema-ml-2.4.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/ml/source/libsvm/LibSVMRelation.scala |  2 +-
 .../scala/org/apache/spark/mllib/util/MLUtils.scala|  3 ++-
 .../spark/ml/source/libsvm/LibSVMRelationSuite.scala   | 18 ++
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
index 39dcd91..5795812 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
@@ -99,7 +99,7 @@ private[libsvm] class LibSVMFileFormat
 "though the input. If you know the number in advance, please specify 
it via " +
 "'numFeatures' option to avoid the extra scan.")
 
-  val paths = files.map(_.getPath.toUri.toString)
+  val paths = files.map(_.getPath.toString)
   val parsed = MLUtils.parseLibSVMFile(sparkSession, paths)
   MLUtils.computeNumFeatures(parsed)
 }
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
index 14af8b5..c8550cd 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
@@ -110,7 +110,8 @@ object MLUtils extends Logging {
   DataSource.apply(
 sparkSession,
 paths = paths,
-className = classOf[TextFileFormat].getName
+className = classOf[TextFileFormat].getName,
+options = Map(DataSource.GLOB_PATHS_KEY -> "false")
   ).resolveRelation(checkFilesExist = false))
   .select("value")
 
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
index 3eabff4..28c770c 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
@@ -184,4 +184,22 @@ class LibSVMRelationSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   spark.sql("DROP TABLE IF EXISTS libsvmTable")
 }
   }
+
+  test("SPARK-32815: Test LibSVM data source on file paths with glob 
metacharacters") {
+val basePath = Utils.createDirectory(tempDir.getCanonicalPath, "globbing")
+// test libsvm writer / reader without specifying schema
+

[spark] branch branch-3.0 updated: [SPARK-32815][ML][3.0] Fix LibSVM data source loading error on file paths with glob metacharacters

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8c0b9cb  [SPARK-32815][ML][3.0] Fix LibSVM data source loading error 
on file paths with glob metacharacters
8c0b9cb is described below

commit 8c0b9cbf68693db22314637a75f28e5aa954aff8
Author: Max Gekk 
AuthorDate: Tue Sep 8 14:16:13 2020 +

[SPARK-32815][ML][3.0] Fix LibSVM data source loading error on file paths 
with glob metacharacters

### What changes were proposed in this pull request?
In the PR, I propose to fix an issue with LibSVM datasource when both of 
the following are true:
* no user specified schema
* some file paths contain escaped glob metacharacters, such as `[``]`, 
`{``}`, `*` etc.

The fix is a backport of https://github.com/apache/spark/pull/29670, and it 
is based on another bug fix for CSV/JSON datasources 
https://github.com/apache/spark/pull/29659.

### Why are the changes needed?
To fix the issue when the follow two queries try to read from paths `[abc]`:
```scala
spark.read.format("libsvm").load("""/tmp/\[abc\].csv""").show
```
but would end up hitting an exception:
```
Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-0-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
at 
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$3(DataSource.scala:770)
at 
org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:373)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
```

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Added UT to `LibSVMRelationSuite`.

Closes #29675 from MaxGekk/globbing-paths-when-inferring-schema-ml-3.0.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
---
 .../spark/ml/source/libsvm/LibSVMRelation.scala  |  2 +-
 .../scala/org/apache/spark/mllib/util/MLUtils.scala  |  3 ++-
 .../spark/ml/source/libsvm/LibSVMRelationSuite.scala | 20 
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
index da8f3a24f..11be1d8 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala
@@ -100,7 +100,7 @@ private[libsvm] class LibSVMFileFormat
 "though the input. If you know the number in advance, please specify 
it via " +
 "'numFeatures' option to avoid the extra scan.")
 
-  val paths = files.map(_.getPath.toUri.toString)
+  val paths = files.map(_.getPath.toString)
   val parsed = MLUtils.parseLibSVMFile(sparkSession, paths)
   MLUtils.computeNumFeatures(parsed)
 }
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
index 9198334..2411300 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
@@ -110,7 +110,8 @@ object MLUtils extends Logging {
   DataSource.apply(
 sparkSession,
 paths = paths,
-className = classOf[TextFileFormat].getName
+className = classOf[TextFileFormat].getName,
+options = Map(DataSource.GLOB_PATHS_KEY -> "false")
   ).resolveRelation(checkFilesExist = false))
   .select("value")
 
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
index 263ad26..0999892 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
@@ -191,4 +191,24 @@ class LibSVMRelationSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   spark.sql("DROP TABLE IF EXISTS libsvmTable")
 }
   }
+
+  test("SPARK-32815: Test LibSVM data source on file paths with glob 
metacharacters") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  // test libsvm writer / reader without specifying schema
+  val svm

[spark] branch master updated (e7d9a245 -> aa87b0a)

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e7d9a245 [SPARK-32817][SQL] DPP throws error when broadcast side is 
empty
 add aa87b0a  [SPARK-32815][ML] Fix LibSVM data source loading error on 
file paths with glob metacharacters

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/source/libsvm/LibSVMRelation.scala  |  2 +-
 .../scala/org/apache/spark/mllib/util/MLUtils.scala  |  3 ++-
 .../spark/ml/source/libsvm/LibSVMRelationSuite.scala | 20 
 3 files changed, 23 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (e7d9a245 -> aa87b0a)

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e7d9a245 [SPARK-32817][SQL] DPP throws error when broadcast side is 
empty
 add aa87b0a  [SPARK-32815][ML] Fix LibSVM data source loading error on 
file paths with glob metacharacters

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/source/libsvm/LibSVMRelation.scala  |  2 +-
 .../scala/org/apache/spark/mllib/util/MLUtils.scala  |  3 ++-
 .../spark/ml/source/libsvm/LibSVMRelationSuite.scala | 20 
 3 files changed, 23 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32753][SQL][3.0] Only copy tags to node with no tags

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 9b39e4b  [SPARK-32753][SQL][3.0] Only copy tags to node with no tags
9b39e4b is described below

commit 9b39e4b7aefcedf09764b3528a32bdf0b77e331b
Author: manuzhang 
AuthorDate: Tue Sep 8 13:36:05 2020 +

[SPARK-32753][SQL][3.0] Only copy tags to node with no tags

This PR backports https://github.com/apache/spark/pull/29593 to branch-3.0

### What changes were proposed in this pull request?
Only copy tags to node with no tags when transforming plans.

### Why are the changes needed?
cloud-fan [made a good 
point](https://github.com/apache/spark/pull/29593#discussion_r482013121) that 
it doesn't make sense to append tags to existing nodes when nodes are removed. 
That will cause such bugs as duplicate rows when deduplicating and 
repartitioning by the same column with AQE.

```
spark.range(10).union(spark.range(10)).createOrReplaceTempView("v1")
val df = spark.sql("select id from v1 group by id distribute by id")
println(df.collect().toArray.mkString(","))
println(df.queryExecution.executedPlan)

// With AQE

[4],[0],[3],[2],[1],[7],[6],[8],[5],[9],[4],[0],[3],[2],[1],[7],[6],[8],[5],[9]
AdaptiveSparkPlan(isFinalPlan=true)
+- CustomShuffleReader local
   +- ShuffleQueryStage 0
  +- Exchange hashpartitioning(id#183L, 10), true
 +- *(3) HashAggregate(keys=[id#183L], functions=[], 
output=[id#183L])
+- Union
   :- *(1) Range (0, 10, step=1, splits=2)
   +- *(2) Range (0, 10, step=1, splits=2)

// Without AQE
[4],[7],[0],[6],[8],[3],[2],[5],[1],[9]
*(4) HashAggregate(keys=[id#206L], functions=[], output=[id#206L])
+- Exchange hashpartitioning(id#206L, 10), true
   +- *(3) HashAggregate(keys=[id#206L], functions=[], output=[id#206L])
  +- Union
 :- *(1) Range (0, 10, step=1, splits=2)
 +- *(2) Range (0, 10, step=1, splits=2)
```

It's too expensive to detect node removal so we make a compromise only to 
copy tags to node with no tags.

### Does this PR introduce any user-facing change?
Yes. Fix a bug.

### How was this patch tested?
Add test.

Closes #29665 from manuzhang/spark-32753-3.0.

Authored-by: manuzhang 
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/catalyst/trees/TreeNode.scala   |  7 ++-
 .../sql/execution/adaptive/AdaptiveQueryExecSuite.scala  | 16 +++-
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
index c4a1067..4c74742 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
@@ -91,7 +91,12 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] 
extends Product {
   private val tags: mutable.Map[TreeNodeTag[_], Any] = mutable.Map.empty
 
   protected def copyTagsFrom(other: BaseType): Unit = {
-tags ++= other.tags
+// SPARK-32753: it only makes sense to copy tags to a new node
+// but it's too expensive to detect other cases likes node removal
+// so we make a compromise here to copy tags to node with no tags
+if (tags.isEmpty) {
+  tags ++= other.tags
+}
   }
 
   def setTagValue[T](tag: TreeNodeTag[T], value: T): Unit = {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
index e18adbd..6d97a6b 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
@@ -28,7 +28,7 @@ import 
org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan}
 import org.apache.spark.sql.execution.{PartialReducerPartitionSpec, 
ReusedSubqueryExec, ShuffledRowRDD, SparkPlan}
 import 
org.apache.spark.sql.execution.adaptive.OptimizeLocalShuffleReader.LOCAL_SHUFFLE_READER_DESCRIPTION
 import org.apache.spark.sql.execution.command.DataWritingCommandExec
-import org.apache.spark.sql.execution.exchange.{BroadcastExchangeExec, 
Exchange, ReusedExchangeExec}
+import org.apache.spark.sql.execution.exchange.{BroadcastExchangeExec, 
Exchange, ReusedExchangeExec, ShuffleExchangeExec}
 import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec, 
BuildRight, SortMergeJoinExec}
 import 
org.apache.spark.sql.execution.ui.SparkListenerSQLAdaptiveExecu

[spark] branch branch-3.0 updated: [SPARK-32753][SQL][3.0] Only copy tags to node with no tags

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 9b39e4b  [SPARK-32753][SQL][3.0] Only copy tags to node with no tags
9b39e4b is described below

commit 9b39e4b7aefcedf09764b3528a32bdf0b77e331b
Author: manuzhang 
AuthorDate: Tue Sep 8 13:36:05 2020 +

[SPARK-32753][SQL][3.0] Only copy tags to node with no tags

This PR backports https://github.com/apache/spark/pull/29593 to branch-3.0

### What changes were proposed in this pull request?
Only copy tags to node with no tags when transforming plans.

### Why are the changes needed?
cloud-fan [made a good 
point](https://github.com/apache/spark/pull/29593#discussion_r482013121) that 
it doesn't make sense to append tags to existing nodes when nodes are removed. 
That will cause such bugs as duplicate rows when deduplicating and 
repartitioning by the same column with AQE.

```
spark.range(10).union(spark.range(10)).createOrReplaceTempView("v1")
val df = spark.sql("select id from v1 group by id distribute by id")
println(df.collect().toArray.mkString(","))
println(df.queryExecution.executedPlan)

// With AQE

[4],[0],[3],[2],[1],[7],[6],[8],[5],[9],[4],[0],[3],[2],[1],[7],[6],[8],[5],[9]
AdaptiveSparkPlan(isFinalPlan=true)
+- CustomShuffleReader local
   +- ShuffleQueryStage 0
  +- Exchange hashpartitioning(id#183L, 10), true
 +- *(3) HashAggregate(keys=[id#183L], functions=[], 
output=[id#183L])
+- Union
   :- *(1) Range (0, 10, step=1, splits=2)
   +- *(2) Range (0, 10, step=1, splits=2)

// Without AQE
[4],[7],[0],[6],[8],[3],[2],[5],[1],[9]
*(4) HashAggregate(keys=[id#206L], functions=[], output=[id#206L])
+- Exchange hashpartitioning(id#206L, 10), true
   +- *(3) HashAggregate(keys=[id#206L], functions=[], output=[id#206L])
  +- Union
 :- *(1) Range (0, 10, step=1, splits=2)
 +- *(2) Range (0, 10, step=1, splits=2)
```

It's too expensive to detect node removal so we make a compromise only to 
copy tags to node with no tags.

### Does this PR introduce any user-facing change?
Yes. Fix a bug.

### How was this patch tested?
Add test.

Closes #29665 from manuzhang/spark-32753-3.0.

Authored-by: manuzhang 
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/catalyst/trees/TreeNode.scala   |  7 ++-
 .../sql/execution/adaptive/AdaptiveQueryExecSuite.scala  | 16 +++-
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
index c4a1067..4c74742 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
@@ -91,7 +91,12 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] 
extends Product {
   private val tags: mutable.Map[TreeNodeTag[_], Any] = mutable.Map.empty
 
   protected def copyTagsFrom(other: BaseType): Unit = {
-tags ++= other.tags
+// SPARK-32753: it only makes sense to copy tags to a new node
+// but it's too expensive to detect other cases likes node removal
+// so we make a compromise here to copy tags to node with no tags
+if (tags.isEmpty) {
+  tags ++= other.tags
+}
   }
 
   def setTagValue[T](tag: TreeNodeTag[T], value: T): Unit = {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
index e18adbd..6d97a6b 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
@@ -28,7 +28,7 @@ import 
org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan}
 import org.apache.spark.sql.execution.{PartialReducerPartitionSpec, 
ReusedSubqueryExec, ShuffledRowRDD, SparkPlan}
 import 
org.apache.spark.sql.execution.adaptive.OptimizeLocalShuffleReader.LOCAL_SHUFFLE_READER_DESCRIPTION
 import org.apache.spark.sql.execution.command.DataWritingCommandExec
-import org.apache.spark.sql.execution.exchange.{BroadcastExchangeExec, 
Exchange, ReusedExchangeExec}
+import org.apache.spark.sql.execution.exchange.{BroadcastExchangeExec, 
Exchange, ReusedExchangeExec, ShuffleExchangeExec}
 import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec, 
BuildRight, SortMergeJoinExec}
 import 
org.apache.spark.sql.execution.ui.SparkListenerSQLAdaptiveExecu

[spark] branch branch-3.0 updated: [SPARK-32753][SQL][3.0] Only copy tags to node with no tags

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 9b39e4b  [SPARK-32753][SQL][3.0] Only copy tags to node with no tags
9b39e4b is described below

commit 9b39e4b7aefcedf09764b3528a32bdf0b77e331b
Author: manuzhang 
AuthorDate: Tue Sep 8 13:36:05 2020 +

[SPARK-32753][SQL][3.0] Only copy tags to node with no tags

This PR backports https://github.com/apache/spark/pull/29593 to branch-3.0

### What changes were proposed in this pull request?
Only copy tags to node with no tags when transforming plans.

### Why are the changes needed?
cloud-fan [made a good 
point](https://github.com/apache/spark/pull/29593#discussion_r482013121) that 
it doesn't make sense to append tags to existing nodes when nodes are removed. 
That will cause such bugs as duplicate rows when deduplicating and 
repartitioning by the same column with AQE.

```
spark.range(10).union(spark.range(10)).createOrReplaceTempView("v1")
val df = spark.sql("select id from v1 group by id distribute by id")
println(df.collect().toArray.mkString(","))
println(df.queryExecution.executedPlan)

// With AQE

[4],[0],[3],[2],[1],[7],[6],[8],[5],[9],[4],[0],[3],[2],[1],[7],[6],[8],[5],[9]
AdaptiveSparkPlan(isFinalPlan=true)
+- CustomShuffleReader local
   +- ShuffleQueryStage 0
  +- Exchange hashpartitioning(id#183L, 10), true
 +- *(3) HashAggregate(keys=[id#183L], functions=[], 
output=[id#183L])
+- Union
   :- *(1) Range (0, 10, step=1, splits=2)
   +- *(2) Range (0, 10, step=1, splits=2)

// Without AQE
[4],[7],[0],[6],[8],[3],[2],[5],[1],[9]
*(4) HashAggregate(keys=[id#206L], functions=[], output=[id#206L])
+- Exchange hashpartitioning(id#206L, 10), true
   +- *(3) HashAggregate(keys=[id#206L], functions=[], output=[id#206L])
  +- Union
 :- *(1) Range (0, 10, step=1, splits=2)
 +- *(2) Range (0, 10, step=1, splits=2)
```

It's too expensive to detect node removal so we make a compromise only to 
copy tags to node with no tags.

### Does this PR introduce any user-facing change?
Yes. Fix a bug.

### How was this patch tested?
Add test.

Closes #29665 from manuzhang/spark-32753-3.0.

Authored-by: manuzhang 
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/catalyst/trees/TreeNode.scala   |  7 ++-
 .../sql/execution/adaptive/AdaptiveQueryExecSuite.scala  | 16 +++-
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
index c4a1067..4c74742 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
@@ -91,7 +91,12 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] 
extends Product {
   private val tags: mutable.Map[TreeNodeTag[_], Any] = mutable.Map.empty
 
   protected def copyTagsFrom(other: BaseType): Unit = {
-tags ++= other.tags
+// SPARK-32753: it only makes sense to copy tags to a new node
+// but it's too expensive to detect other cases likes node removal
+// so we make a compromise here to copy tags to node with no tags
+if (tags.isEmpty) {
+  tags ++= other.tags
+}
   }
 
   def setTagValue[T](tag: TreeNodeTag[T], value: T): Unit = {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
index e18adbd..6d97a6b 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
@@ -28,7 +28,7 @@ import 
org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan}
 import org.apache.spark.sql.execution.{PartialReducerPartitionSpec, 
ReusedSubqueryExec, ShuffledRowRDD, SparkPlan}
 import 
org.apache.spark.sql.execution.adaptive.OptimizeLocalShuffleReader.LOCAL_SHUFFLE_READER_DESCRIPTION
 import org.apache.spark.sql.execution.command.DataWritingCommandExec
-import org.apache.spark.sql.execution.exchange.{BroadcastExchangeExec, 
Exchange, ReusedExchangeExec}
+import org.apache.spark.sql.execution.exchange.{BroadcastExchangeExec, 
Exchange, ReusedExchangeExec, ShuffleExchangeExec}
 import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec, 
BuildRight, SortMergeJoinExec}
 import 
org.apache.spark.sql.execution.ui.SparkListenerSQLAdaptiveExecu

[spark] branch branch-3.0 updated: [SPARK-32753][SQL][3.0] Only copy tags to node with no tags

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 9b39e4b  [SPARK-32753][SQL][3.0] Only copy tags to node with no tags
9b39e4b is described below

commit 9b39e4b7aefcedf09764b3528a32bdf0b77e331b
Author: manuzhang 
AuthorDate: Tue Sep 8 13:36:05 2020 +

[SPARK-32753][SQL][3.0] Only copy tags to node with no tags

This PR backports https://github.com/apache/spark/pull/29593 to branch-3.0

### What changes were proposed in this pull request?
Only copy tags to node with no tags when transforming plans.

### Why are the changes needed?
cloud-fan [made a good 
point](https://github.com/apache/spark/pull/29593#discussion_r482013121) that 
it doesn't make sense to append tags to existing nodes when nodes are removed. 
That will cause such bugs as duplicate rows when deduplicating and 
repartitioning by the same column with AQE.

```
spark.range(10).union(spark.range(10)).createOrReplaceTempView("v1")
val df = spark.sql("select id from v1 group by id distribute by id")
println(df.collect().toArray.mkString(","))
println(df.queryExecution.executedPlan)

// With AQE

[4],[0],[3],[2],[1],[7],[6],[8],[5],[9],[4],[0],[3],[2],[1],[7],[6],[8],[5],[9]
AdaptiveSparkPlan(isFinalPlan=true)
+- CustomShuffleReader local
   +- ShuffleQueryStage 0
  +- Exchange hashpartitioning(id#183L, 10), true
 +- *(3) HashAggregate(keys=[id#183L], functions=[], 
output=[id#183L])
+- Union
   :- *(1) Range (0, 10, step=1, splits=2)
   +- *(2) Range (0, 10, step=1, splits=2)

// Without AQE
[4],[7],[0],[6],[8],[3],[2],[5],[1],[9]
*(4) HashAggregate(keys=[id#206L], functions=[], output=[id#206L])
+- Exchange hashpartitioning(id#206L, 10), true
   +- *(3) HashAggregate(keys=[id#206L], functions=[], output=[id#206L])
  +- Union
 :- *(1) Range (0, 10, step=1, splits=2)
 +- *(2) Range (0, 10, step=1, splits=2)
```

It's too expensive to detect node removal so we make a compromise only to 
copy tags to node with no tags.

### Does this PR introduce any user-facing change?
Yes. Fix a bug.

### How was this patch tested?
Add test.

Closes #29665 from manuzhang/spark-32753-3.0.

Authored-by: manuzhang 
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/catalyst/trees/TreeNode.scala   |  7 ++-
 .../sql/execution/adaptive/AdaptiveQueryExecSuite.scala  | 16 +++-
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
index c4a1067..4c74742 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
@@ -91,7 +91,12 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] 
extends Product {
   private val tags: mutable.Map[TreeNodeTag[_], Any] = mutable.Map.empty
 
   protected def copyTagsFrom(other: BaseType): Unit = {
-tags ++= other.tags
+// SPARK-32753: it only makes sense to copy tags to a new node
+// but it's too expensive to detect other cases likes node removal
+// so we make a compromise here to copy tags to node with no tags
+if (tags.isEmpty) {
+  tags ++= other.tags
+}
   }
 
   def setTagValue[T](tag: TreeNodeTag[T], value: T): Unit = {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
index e18adbd..6d97a6b 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
@@ -28,7 +28,7 @@ import 
org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan}
 import org.apache.spark.sql.execution.{PartialReducerPartitionSpec, 
ReusedSubqueryExec, ShuffledRowRDD, SparkPlan}
 import 
org.apache.spark.sql.execution.adaptive.OptimizeLocalShuffleReader.LOCAL_SHUFFLE_READER_DESCRIPTION
 import org.apache.spark.sql.execution.command.DataWritingCommandExec
-import org.apache.spark.sql.execution.exchange.{BroadcastExchangeExec, 
Exchange, ReusedExchangeExec}
+import org.apache.spark.sql.execution.exchange.{BroadcastExchangeExec, 
Exchange, ReusedExchangeExec, ShuffleExchangeExec}
 import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec, 
BuildRight, SortMergeJoinExec}
 import 
org.apache.spark.sql.execution.ui.SparkListenerSQLAdaptiveExecu

[spark] branch branch-3.0 updated (4656ee5 -> 9b39e4b)

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4656ee5  [SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap 
iterators thread-safe
 add 9b39e4b  [SPARK-32753][SQL][3.0] Only copy tags to node with no tags

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/trees/TreeNode.scala   |  7 ++-
 .../sql/execution/adaptive/AdaptiveQueryExecSuite.scala  | 16 +++-
 2 files changed, 21 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (bd3dc2f5 -> e7d9a245)

2020-09-08 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bd3dc2f5 [SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap 
iterators thread-safe
 add e7d9a245 [SPARK-32817][SQL] DPP throws error when broadcast side is 
empty

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/joins/HashedRelation.scala  |  2 +-
 .../apache/spark/sql/DynamicPartitionPruningSuite.scala | 17 +
 .../spark/sql/execution/joins/HashedRelationSuite.scala |  6 +-
 3 files changed, 23 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (bd3dc2f5 -> e7d9a245)

2020-09-08 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bd3dc2f5 [SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap 
iterators thread-safe
 add e7d9a245 [SPARK-32817][SQL] DPP throws error when broadcast side is 
empty

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/joins/HashedRelation.scala  |  2 +-
 .../apache/spark/sql/DynamicPartitionPruningSuite.scala | 17 +
 .../spark/sql/execution/joins/HashedRelationSuite.scala |  6 +-
 3 files changed, 23 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (bd3dc2f5 -> e7d9a245)

2020-09-08 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bd3dc2f5 [SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap 
iterators thread-safe
 add e7d9a245 [SPARK-32817][SQL] DPP throws error when broadcast side is 
empty

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/joins/HashedRelation.scala  |  2 +-
 .../apache/spark/sql/DynamicPartitionPruningSuite.scala | 17 +
 .../spark/sql/execution/joins/HashedRelationSuite.scala |  6 +-
 3 files changed, 23 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (bd3dc2f5 -> e7d9a245)

2020-09-08 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bd3dc2f5 [SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap 
iterators thread-safe
 add e7d9a245 [SPARK-32817][SQL] DPP throws error when broadcast side is 
empty

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/joins/HashedRelation.scala  |  2 +-
 .../apache/spark/sql/DynamicPartitionPruningSuite.scala | 17 +
 .../spark/sql/execution/joins/HashedRelationSuite.scala |  6 +-
 3 files changed, 23 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-32817][SQL] DPP throws error when broadcast side is empty

2020-09-08 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e7d9a245 [SPARK-32817][SQL] DPP throws error when broadcast side is 
empty
e7d9a245 is described below

commit e7d9a245656655e7bb1df3e04df30eb3cc9e23ad
Author: Zhenhua Wang 
AuthorDate: Tue Sep 8 21:36:21 2020 +0900

[SPARK-32817][SQL] DPP throws error when broadcast side is empty

### What changes were proposed in this pull request?

In `SubqueryBroadcastExec.relationFuture`, if the `broadcastRelation` is an 
`EmptyHashedRelation`, then `broadcastRelation.keys()` will throw 
`UnsupportedOperationException`.

### Why are the changes needed?

To fix a bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added a new test.

Closes #29671 from wzhfy/dpp_empty_broadcast.

Authored-by: Zhenhua Wang 
Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/execution/joins/HashedRelation.scala  |  2 +-
 .../apache/spark/sql/DynamicPartitionPruningSuite.scala | 17 +
 .../spark/sql/execution/joins/HashedRelationSuite.scala |  6 +-
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
index 89836f6..3c5ed40 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
@@ -1091,7 +1091,7 @@ case object EmptyHashedRelation extends HashedRelation {
   override def keyIsUnique: Boolean = true
 
   override def keys(): Iterator[InternalRow] = {
-throw new UnsupportedOperationException
+Iterator.empty
   }
 
   override def close(): Unit = {}
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala
index ba61be5..55437aa 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala
@@ -1344,6 +1344,23 @@ abstract class DynamicPartitionPruningSuiteBase
   }
 }
   }
+
+  test("SPARK-32817: DPP throws error when the broadcast side is empty") {
+withSQLConf(
+  SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true",
+  SQLConf.DYNAMIC_PARTITION_PRUNING_REUSE_BROADCAST_ONLY.key -> "true") {
+  val df = sql(
+"""
+  |SELECT * FROM fact_sk f
+  |JOIN dim_store s
+  |ON f.store_id = s.store_id WHERE s.country = 'XYZ'
+""".stripMargin)
+
+  checkPartitionPruningPredicate(df, false, true)
+
+  checkAnswer(df, Nil)
+}
+  }
 }
 
 class DynamicPartitionPruningSuiteAEOff extends 
DynamicPartitionPruningSuiteBase {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala
index caa7bdf..84f6299 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala
@@ -621,7 +621,7 @@ class HashedRelationSuite extends SharedSparkSession {
 }
   }
 
-  test("EmptyHashedRelation return null in get / getValue") {
+  test("EmptyHashedRelation override methods behavior test") {
 val buildKey = Seq(BoundReference(0, LongType, false))
 val hashed = HashedRelation(Seq.empty[InternalRow].toIterator, buildKey, 
1, mm)
 assert(hashed == EmptyHashedRelation)
@@ -631,6 +631,10 @@ class HashedRelationSuite extends SharedSparkSession {
 assert(hashed.get(key) == null)
 assert(hashed.getValue(0L) == null)
 assert(hashed.getValue(key) == null)
+
+assert(hashed.keys().isEmpty)
+assert(hashed.keyIsUnique)
+assert(hashed.estimatedSize == 0)
   }
 
   test("SPARK-32399: test methods related to key index") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (3b32ddf -> 4656ee5)

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3b32ddf  [SPARK-32785][SQL][3.0] Interval with dangling parts should 
not result null
 add 4656ee5  [SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap 
iterators thread-safe

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/joins/HashedRelationSuite.scala  | 39 ++
 1 file changed, 39 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (3b32ddf -> 4656ee5)

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3b32ddf  [SPARK-32785][SQL][3.0] Interval with dangling parts should 
not result null
 add 4656ee5  [SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap 
iterators thread-safe

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/joins/HashedRelationSuite.scala  | 39 ++
 1 file changed, 39 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (55d38a4 -> bd3dc2f5)

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 55d38a4  [SPARK-32748][SQL] Revert "Support local property propagation 
in SubqueryBroadcastExec"
 add bd3dc2f5 [SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap 
iterators thread-safe

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/joins/HashedRelationSuite.scala  | 39 ++
 1 file changed, 39 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (3b32ddf -> 4656ee5)

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3b32ddf  [SPARK-32785][SQL][3.0] Interval with dangling parts should 
not result null
 add 4656ee5  [SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap 
iterators thread-safe

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/joins/HashedRelationSuite.scala  | 39 ++
 1 file changed, 39 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (55d38a4 -> bd3dc2f5)

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 55d38a4  [SPARK-32748][SQL] Revert "Support local property propagation 
in SubqueryBroadcastExec"
 add bd3dc2f5 [SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap 
iterators thread-safe

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/joins/HashedRelationSuite.scala  | 39 ++
 1 file changed, 39 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (3b32ddf -> 4656ee5)

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3b32ddf  [SPARK-32785][SQL][3.0] Interval with dangling parts should 
not result null
 add 4656ee5  [SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap 
iterators thread-safe

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/joins/HashedRelationSuite.scala  | 39 ++
 1 file changed, 39 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (55d38a4 -> bd3dc2f5)

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 55d38a4  [SPARK-32748][SQL] Revert "Support local property propagation 
in SubqueryBroadcastExec"
 add bd3dc2f5 [SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap 
iterators thread-safe

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/joins/HashedRelationSuite.scala  | 39 ++
 1 file changed, 39 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (3b32ddf -> 4656ee5)

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3b32ddf  [SPARK-32785][SQL][3.0] Interval with dangling parts should 
not result null
 add 4656ee5  [SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap 
iterators thread-safe

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/joins/HashedRelationSuite.scala  | 39 ++
 1 file changed, 39 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (55d38a4 -> bd3dc2f5)

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 55d38a4  [SPARK-32748][SQL] Revert "Support local property propagation 
in SubqueryBroadcastExec"
 add bd3dc2f5 [SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap 
iterators thread-safe

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/joins/HashedRelationSuite.scala  | 39 ++
 1 file changed, 39 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap iterators thread-safe

2020-09-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bd3dc2f5 [SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap 
iterators thread-safe
bd3dc2f5 is described below

commit bd3dc2f54d871d152331612c53f586181f4e87fc
Author: sychen 
AuthorDate: Tue Sep 8 11:54:04 2020 +

[SPARK-31511][FOLLOW-UP][TEST][SQL] Make BytesToBytesMap iterators 
thread-safe

### What changes were proposed in this pull request?
Before SPARK-31511 is fixed, `BytesToBytesMap` iterator() is not 
thread-safe and may cause data inaccuracy.
We need to add a unit test.

### Why are the changes needed?
Increase test coverage to ensure that iterator() is thread-safe.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
add ut

Closes #29669 from cxzl25/SPARK-31511-test.

Authored-by: sychen 
Signed-off-by: Wenchen Fan 
---
 .../sql/execution/joins/HashedRelationSuite.scala  | 39 ++
 1 file changed, 39 insertions(+)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala
index 72e921d..caa7bdf 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala
@@ -360,6 +360,45 @@ class HashedRelationSuite extends SharedSparkSession {
 assert(java.util.Arrays.equals(os.toByteArray, os2.toByteArray))
   }
 
+  test("SPARK-31511: Make BytesToBytesMap iterators thread-safe") {
+val ser = sparkContext.env.serializer.newInstance()
+val key = Seq(BoundReference(0, LongType, false))
+
+val unsafeProj = UnsafeProjection.create(
+  Seq(BoundReference(0, LongType, false), BoundReference(1, IntegerType, 
true)))
+val rows = (0 until 1).map(i => 
unsafeProj(InternalRow(Int.int2long(i), i + 1)).copy())
+val unsafeHashed = UnsafeHashedRelation(rows.iterator, key, 1, mm)
+
+val os = new ByteArrayOutputStream()
+val thread1 = new Thread {
+  override def run(): Unit = {
+val out = new ObjectOutputStream(os)
+unsafeHashed.asInstanceOf[UnsafeHashedRelation].writeExternal(out)
+out.flush()
+  }
+}
+
+val thread2 = new Thread {
+  override def run(): Unit = {
+val threadOut = new ObjectOutputStream(new ByteArrayOutputStream())
+
unsafeHashed.asInstanceOf[UnsafeHashedRelation].writeExternal(threadOut)
+threadOut.flush()
+  }
+}
+
+thread1.start()
+thread2.start()
+thread1.join()
+thread2.join()
+
+val unsafeHashed2 = 
ser.deserialize[UnsafeHashedRelation](ser.serialize(unsafeHashed))
+val os2 = new ByteArrayOutputStream()
+val out2 = new ObjectOutputStream(os2)
+unsafeHashed2.writeExternal(out2)
+out2.flush()
+assert(java.util.Arrays.equals(os.toByteArray, os2.toByteArray))
+  }
+
   // This test require 4G heap to run, should run it manually
   ignore("build HashedRelation that is larger than 1G") {
 val unsafeProj = UnsafeProjection.create(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] maropu commented on a change in pull request #289: Release 3.0.1

2020-09-08 Thread GitBox


maropu commented on a change in pull request #289:
URL: https://github.com/apache/spark-website/pull/289#discussion_r484854079



##
File path: releases/_posts/2020-09-08-spark-release-3-0-1.md
##
@@ -0,0 +1,52 @@
+---
+layout: post
+title: Spark Release 3.0.1
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+  _edit_last: '4'
+  _wpas_done_all: '1'
+---
+
+Spark 3.0.1 is a maintenance release containing stability fixes. This release 
is based on the branch-3.0 maintenance branch of Spark. We strongly recommend 
all 3.0 users to upgrade to this stable release.
+
+### Notable changes
+  - [[SPARK-26905]](https://issues.apache.org/jira/browse/SPARK-26905): 
Revisit reserved/non-reserved keywords based on the ANSI SQL standard
+  - [[SPARK-31220]](https://issues.apache.org/jira/browse/SPARK-31220): 
repartition obeys spark.sql.adaptive.coalescePartitions.initialPartitionNum 
when spark.sql.adaptive.enabled

Review comment:
   Ah, ok.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] cloud-fan commented on a change in pull request #289: Release 3.0.1

2020-09-08 Thread GitBox


cloud-fan commented on a change in pull request #289:
URL: https://github.com/apache/spark-website/pull/289#discussion_r484851836



##
File path: releases/_posts/2020-09-08-spark-release-3-0-1.md
##
@@ -0,0 +1,52 @@
+---
+layout: post
+title: Spark Release 3.0.1
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+  _edit_last: '4'
+  _wpas_done_all: '1'
+---
+
+Spark 3.0.1 is a maintenance release containing stability fixes. This release 
is based on the branch-3.0 maintenance branch of Spark. We strongly recommend 
all 3.0 users to upgrade to this stable release.
+
+### Notable changes
+  - [[SPARK-26905]](https://issues.apache.org/jira/browse/SPARK-26905): 
Revisit reserved/non-reserved keywords based on the ANSI SQL standard
+  - [[SPARK-31220]](https://issues.apache.org/jira/browse/SPARK-31220): 
repartition obeys spark.sql.adaptive.coalescePartitions.initialPartitionNum 
when spark.sql.adaptive.enabled

Review comment:
   seems like we don't have sub-sections in other patch releases either.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (125cbe3 -> 55d38a4)

2020-09-08 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 125cbe3  [SPARK-32736][CORE] Avoid caching the removed decommissioned 
executors in TaskSchedulerImpl
 add 55d38a4  [SPARK-32748][SQL] Revert "Support local property propagation 
in SubqueryBroadcastExec"

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/SubqueryBroadcastExec.scala  | 16 ++
 .../sql/internal/ExecutorSideSQLConfSuite.scala| 63 +-
 2 files changed, 7 insertions(+), 72 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (125cbe3 -> 55d38a4)

2020-09-08 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 125cbe3  [SPARK-32736][CORE] Avoid caching the removed decommissioned 
executors in TaskSchedulerImpl
 add 55d38a4  [SPARK-32748][SQL] Revert "Support local property propagation 
in SubqueryBroadcastExec"

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/SubqueryBroadcastExec.scala  | 16 ++
 .../sql/internal/ExecutorSideSQLConfSuite.scala| 63 +-
 2 files changed, 7 insertions(+), 72 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (125cbe3 -> 55d38a4)

2020-09-08 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 125cbe3  [SPARK-32736][CORE] Avoid caching the removed decommissioned 
executors in TaskSchedulerImpl
 add 55d38a4  [SPARK-32748][SQL] Revert "Support local property propagation 
in SubqueryBroadcastExec"

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/SubqueryBroadcastExec.scala  | 16 ++
 .../sql/internal/ExecutorSideSQLConfSuite.scala| 63 +-
 2 files changed, 7 insertions(+), 72 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (125cbe3 -> 55d38a4)

2020-09-08 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 125cbe3  [SPARK-32736][CORE] Avoid caching the removed decommissioned 
executors in TaskSchedulerImpl
 add 55d38a4  [SPARK-32748][SQL] Revert "Support local property propagation 
in SubqueryBroadcastExec"

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/SubqueryBroadcastExec.scala  | 16 ++
 .../sql/internal/ExecutorSideSQLConfSuite.scala| 63 +-
 2 files changed, 7 insertions(+), 72 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (125cbe3 -> 55d38a4)

2020-09-08 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 125cbe3  [SPARK-32736][CORE] Avoid caching the removed decommissioned 
executors in TaskSchedulerImpl
 add 55d38a4  [SPARK-32748][SQL] Revert "Support local property propagation 
in SubqueryBroadcastExec"

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/SubqueryBroadcastExec.scala  | 16 ++
 .../sql/internal/ExecutorSideSQLConfSuite.scala| 63 +-
 2 files changed, 7 insertions(+), 72 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] maropu commented on a change in pull request #289: Release 3.0.1

2020-09-08 Thread GitBox


maropu commented on a change in pull request #289:
URL: https://github.com/apache/spark-website/pull/289#discussion_r484800121



##
File path: releases/_posts/2020-09-08-spark-release-3-0-1.md
##
@@ -0,0 +1,52 @@
+---
+layout: post
+title: Spark Release 3.0.1
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+  _edit_last: '4'
+  _wpas_done_all: '1'
+---
+
+Spark 3.0.1 is a maintenance release containing stability fixes. This release 
is based on the branch-3.0 maintenance branch of Spark. We strongly recommend 
all 3.0 users to upgrade to this stable release.
+
+### Notable changes
+  - [[SPARK-26905]](https://issues.apache.org/jira/browse/SPARK-26905): 
Revisit reserved/non-reserved keywords based on the ANSI SQL standard
+  - [[SPARK-31220]](https://issues.apache.org/jira/browse/SPARK-31220): 
repartition obeys spark.sql.adaptive.coalescePartitions.initialPartitionNum 
when spark.sql.adaptive.enabled
+  - [[SPARK-31703]](https://issues.apache.org/jira/browse/SPARK-31703): 
Changes made by SPARK-26985 break reading parquet files correctly in BigEndian 
architectures (AIX + LinuxPPC64)
+  - [[SPARK-31915]](https://issues.apache.org/jira/browse/SPARK-31915): 
Resolve the grouping column properly per the case sensitivity in grouped and 
cogrouped pandas UDFs
+  - [[SPARK-31923]](https://issues.apache.org/jira/browse/SPARK-31923): Event 
log cannot be generated when some internal accumulators use unexpected types
+  - [[SPARK-31935]](https://issues.apache.org/jira/browse/SPARK-31935): Hadoop 
file system config should be effective in data source options 
+  - [[SPARK-31968]](https://issues.apache.org/jira/browse/SPARK-31968): 
write.partitionBy() creates duplicate subdirectories when user provides 
duplicate columns
+  - [[SPARK-31983]](https://issues.apache.org/jira/browse/SPARK-31983): Tables 
of structured streaming tab show wrong result for duration column
+  - [[SPARK-31990]](https://issues.apache.org/jira/browse/SPARK-31990): 
Streaming's state store compatibility is broken
+  - [[SPARK-32003]](https://issues.apache.org/jira/browse/SPARK-32003): 
Shuffle files for lost executor are not unregistered if fetch failure occurs 
after executor is lost
+  - [[SPARK-32038]](https://issues.apache.org/jira/browse/SPARK-32038): 
Regression in handling NaN values in COUNT(DISTINCT)
+  - [[SPARK-32073]](https://issues.apache.org/jira/browse/SPARK-32073): Drop R 
< 3.5 support
+  - [[SPARK-32092]](https://issues.apache.org/jira/browse/SPARK-32092): 
CrossvalidatorModel does not save all submodels (it saves only 3)
+  - [[SPARK-32136]](https://issues.apache.org/jira/browse/SPARK-32136): Spark 
producing incorrect groupBy results when key is a struct with nullable 
properties
+  - [[SPARK-32148]](https://issues.apache.org/jira/browse/SPARK-32148): LEFT 
JOIN generating non-deterministic and unexpected result (regression in Spark 
3.0)
+  - [[SPARK-32220]](https://issues.apache.org/jira/browse/SPARK-32220): 
Cartesian Product Hint cause data error
+  - [[SPARK-32310]](https://issues.apache.org/jira/browse/SPARK-32310): ML 
params default value parity
+  - [[SPARK-32339]](https://issues.apache.org/jira/browse/SPARK-32339): 
Improve MLlib BLAS native acceleration docs
+  - [[SPARK-32424]](https://issues.apache.org/jira/browse/SPARK-32424): Fix 
silent data change for timestamp parsing if overflow happens
+  - [[SPARK-32451]](https://issues.apache.org/jira/browse/SPARK-32451): 
Support Apache Arrow 1.0.0 in SparkR
+  - [[SPARK-32456]](https://issues.apache.org/jira/browse/SPARK-32456): Check 
the Distinct by assuming it as Aggregate for Structured Streaming
+  - [[SPARK-32608]](https://issues.apache.org/jira/browse/SPARK-32608): Script 
Transform DELIMIT value should be formatted
+  - [[SPARK-32646]](https://issues.apache.org/jira/browse/SPARK-32646): ORC 
predicate pushdown should work with case-insensitive analysis
+  - [[SPARK-32658]](https://issues.apache.org/jira/browse/SPARK-32658): 
Partition length number overflow in PartitionWriterStream
+  - [[SPARK-32676]](https://issues.apache.org/jira/browse/SPARK-32676): Fix 
double caching in KMeans/BiKMeans
+
+
+### Known issues
+  - [[SPARK-31511]](https://issues.apache.org/jira/browse/SPARK-31511): Make 
BytesToBytesMap iterator() thread-safe

Review comment:
   To follow [the 3.0.0 release 
note](https://spark.apache.org/releases/spark-release-3-0-0.html), how about 
adding a phrase `This will be fixed in Spark 3.0.2.` for already-fixed issues? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-

[GitHub] [spark-website] maropu commented on a change in pull request #289: Release 3.0.1

2020-09-08 Thread GitBox


maropu commented on a change in pull request #289:
URL: https://github.com/apache/spark-website/pull/289#discussion_r484795335



##
File path: releases/_posts/2020-09-08-spark-release-3-0-1.md
##
@@ -0,0 +1,52 @@
+---
+layout: post
+title: Spark Release 3.0.1
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+  _edit_last: '4'
+  _wpas_done_all: '1'
+---
+
+Spark 3.0.1 is a maintenance release containing stability fixes. This release 
is based on the branch-3.0 maintenance branch of Spark. We strongly recommend 
all 3.0 users to upgrade to this stable release.
+
+### Notable changes
+  - [[SPARK-26905]](https://issues.apache.org/jira/browse/SPARK-26905): 
Revisit reserved/non-reserved keywords based on the ANSI SQL standard
+  - [[SPARK-31220]](https://issues.apache.org/jira/browse/SPARK-31220): 
repartition obeys spark.sql.adaptive.coalescePartitions.initialPartitionNum 
when spark.sql.adaptive.enabled

Review comment:
   Could we split this `Notable changes` section into sub-secions like 
`SQL`, `ML`, ...?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



  1   2   >