[spark] branch branch-2.4 updated: Revert "[SPARK-28912][STREAMING] Fixed MatchError in getCheckpointFiles()"

2019-09-06 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 0a4b356  Revert "[SPARK-28912][STREAMING] Fixed MatchError in 
getCheckpointFiles()"
0a4b356 is described below

commit 0a4b35642ffa3020ec0fcae2cca59376e2095636
Author: Xiao Li 
AuthorDate: Fri Sep 6 23:37:36 2019 -0700

Revert "[SPARK-28912][STREAMING] Fixed MatchError in getCheckpointFiles()"

This reverts commit 2654c33fd6a7a09e2b2fa9fc1c2ea6224ab292e6.
---
 .../scala/org/apache/spark/streaming/Checkpoint.scala   |  4 ++--
 .../org/apache/spark/streaming/CheckpointSuite.scala| 17 -
 2 files changed, 2 insertions(+), 19 deletions(-)

diff --git 
a/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala 
b/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
index b081287..a882558 100644
--- a/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
+++ b/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
@@ -128,8 +128,8 @@ object Checkpoint extends Logging {
 try {
   val statuses = fs.listStatus(path)
   if (statuses != null) {
-val paths = statuses.filterNot(_.isDirectory).map(_.getPath)
-val filtered = paths.filter(p => REGEX.findFirstIn(p.getName).nonEmpty)
+val paths = statuses.map(_.getPath)
+val filtered = paths.filter(p => 
REGEX.findFirstIn(p.toString).nonEmpty)
 filtered.sortWith(sortFunc)
   } else {
 logWarning(s"Listing $path returned null")
diff --git 
a/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala 
b/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
index 43e3cdd..19b621f 100644
--- a/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
+++ b/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
@@ -846,23 +846,6 @@ class CheckpointSuite extends TestSuiteBase with 
DStreamCheckpointTester
 checkpointWriter.stop()
   }
 
-  test("SPARK-28912: Fix MatchError in getCheckpointFiles") {
-withTempDir { tempDir =>
-  val fs = FileSystem.get(tempDir.toURI, new Configuration)
-  val checkpointDir = tempDir.getAbsolutePath + "/checkpoint-01"
-
-  assert(Checkpoint.getCheckpointFiles(checkpointDir, Some(fs)).length === 
0)
-
-  // Ignore files whose parent path match.
-  fs.create(new Path(checkpointDir, 
"this-is-matched-before-due-to-parent-path")).close()
-  assert(Checkpoint.getCheckpointFiles(checkpointDir, Some(fs)).length === 
0)
-
-  // Ignore directories whose names match.
-  fs.mkdirs(new Path(checkpointDir, "checkpoint-10"))
-  assert(Checkpoint.getCheckpointFiles(checkpointDir, Some(fs)).length === 
0)
-}
-  }
-
   test("SPARK-6847: stack overflow when updateStateByKey is followed by a 
checkpointed dstream") {
 // In this test, there are two updateStateByKey operators. The RDD DAG is 
as follows:
 //


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-28912][STREAMING] Fixed MatchError in getCheckpointFiles()

2019-09-06 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 2654c33  [SPARK-28912][STREAMING] Fixed MatchError in 
getCheckpointFiles()
2654c33 is described below

commit 2654c33fd6a7a09e2b2fa9fc1c2ea6224ab292e6
Author: avk 
AuthorDate: Fri Sep 6 17:55:09 2019 -0700

[SPARK-28912][STREAMING] Fixed MatchError in getCheckpointFiles()

### What changes were proposed in this pull request?

This change fixes issue SPARK-28912.

### Why are the changes needed?

If checkpoint directory is set to name which matches regex pattern used for 
checkpoint files then logs are flooded with MatchError exceptions and old 
checkpoint files are not removed.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually.

1. Start Hadoop in a pseudo-distributed mode.

2. In another terminal run command  nc -lk 

3. In the Spark shell execute the following statements:

```scala
val ssc = new StreamingContext(sc, Seconds(30))
ssc.checkpoint("hdfs://localhost:9000/checkpoint-01")
val lines = ssc.socketTextStream("localhost", )
val words = lines.flatMap(_.split(" "))
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
wordCounts.print()
ssc.start()
ssc.awaitTermination()
```

Closes #25654 from avkgh/SPARK-28912.

Authored-by: avk 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 723faadf80da91a6e5514fc16b7af3ca4900eda8)
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/streaming/Checkpoint.scala   |  4 ++--
 .../org/apache/spark/streaming/CheckpointSuite.scala| 17 +
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git 
a/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala 
b/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
index a882558..b081287 100644
--- a/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
+++ b/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
@@ -128,8 +128,8 @@ object Checkpoint extends Logging {
 try {
   val statuses = fs.listStatus(path)
   if (statuses != null) {
-val paths = statuses.map(_.getPath)
-val filtered = paths.filter(p => 
REGEX.findFirstIn(p.toString).nonEmpty)
+val paths = statuses.filterNot(_.isDirectory).map(_.getPath)
+val filtered = paths.filter(p => REGEX.findFirstIn(p.getName).nonEmpty)
 filtered.sortWith(sortFunc)
   } else {
 logWarning(s"Listing $path returned null")
diff --git 
a/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala 
b/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
index 19b621f..43e3cdd 100644
--- a/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
+++ b/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
@@ -846,6 +846,23 @@ class CheckpointSuite extends TestSuiteBase with 
DStreamCheckpointTester
 checkpointWriter.stop()
   }
 
+  test("SPARK-28912: Fix MatchError in getCheckpointFiles") {
+withTempDir { tempDir =>
+  val fs = FileSystem.get(tempDir.toURI, new Configuration)
+  val checkpointDir = tempDir.getAbsolutePath + "/checkpoint-01"
+
+  assert(Checkpoint.getCheckpointFiles(checkpointDir, Some(fs)).length === 
0)
+
+  // Ignore files whose parent path match.
+  fs.create(new Path(checkpointDir, 
"this-is-matched-before-due-to-parent-path")).close()
+  assert(Checkpoint.getCheckpointFiles(checkpointDir, Some(fs)).length === 
0)
+
+  // Ignore directories whose names match.
+  fs.mkdirs(new Path(checkpointDir, "checkpoint-10"))
+  assert(Checkpoint.getCheckpointFiles(checkpointDir, Some(fs)).length === 
0)
+}
+  }
+
   test("SPARK-6847: stack overflow when updateStateByKey is followed by a 
checkpointed dstream") {
 // In this test, there are two updateStateByKey operators. The RDD DAG is 
as follows:
 //


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-28912][STREAMING] Fixed MatchError in getCheckpointFiles()

2019-09-06 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 723faad  [SPARK-28912][STREAMING] Fixed MatchError in 
getCheckpointFiles()
723faad is described below

commit 723faadf80da91a6e5514fc16b7af3ca4900eda8
Author: avk 
AuthorDate: Fri Sep 6 17:55:09 2019 -0700

[SPARK-28912][STREAMING] Fixed MatchError in getCheckpointFiles()

### What changes were proposed in this pull request?

This change fixes issue SPARK-28912.

### Why are the changes needed?

If checkpoint directory is set to name which matches regex pattern used for 
checkpoint files then logs are flooded with MatchError exceptions and old 
checkpoint files are not removed.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually.

1. Start Hadoop in a pseudo-distributed mode.

2. In another terminal run command  nc -lk 

3. In the Spark shell execute the following statements:

```scala
val ssc = new StreamingContext(sc, Seconds(30))
ssc.checkpoint("hdfs://localhost:9000/checkpoint-01")
val lines = ssc.socketTextStream("localhost", )
val words = lines.flatMap(_.split(" "))
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
wordCounts.print()
ssc.start()
ssc.awaitTermination()
```

Closes #25654 from avkgh/SPARK-28912.

Authored-by: avk 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/streaming/Checkpoint.scala   |  4 ++--
 .../org/apache/spark/streaming/CheckpointSuite.scala| 17 +
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git 
a/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala 
b/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
index 54f91ff..c66ba2c 100644
--- a/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
+++ b/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
@@ -131,8 +131,8 @@ object Checkpoint extends Logging {
 try {
   val statuses = fs.listStatus(path)
   if (statuses != null) {
-val paths = statuses.map(_.getPath)
-val filtered = paths.filter(p => 
REGEX.findFirstIn(p.toString).nonEmpty)
+val paths = statuses.filterNot(_.isDirectory).map(_.getPath)
+val filtered = paths.filter(p => REGEX.findFirstIn(p.getName).nonEmpty)
 filtered.sortWith(sortFunc)
   } else {
 logWarning(s"Listing $path returned null")
diff --git 
a/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala 
b/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
index 55fdd4c..ff5e3ff 100644
--- a/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
+++ b/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
@@ -847,6 +847,23 @@ class CheckpointSuite extends TestSuiteBase with 
DStreamCheckpointTester
 checkpointWriter.stop()
   }
 
+  test("SPARK-28912: Fix MatchError in getCheckpointFiles") {
+withTempDir { tempDir =>
+  val fs = FileSystem.get(tempDir.toURI, new Configuration)
+  val checkpointDir = tempDir.getAbsolutePath + "/checkpoint-01"
+
+  assert(Checkpoint.getCheckpointFiles(checkpointDir, Some(fs)).length === 
0)
+
+  // Ignore files whose parent path match.
+  fs.create(new Path(checkpointDir, 
"this-is-matched-before-due-to-parent-path")).close()
+  assert(Checkpoint.getCheckpointFiles(checkpointDir, Some(fs)).length === 
0)
+
+  // Ignore directories whose names match.
+  fs.mkdirs(new Path(checkpointDir, "checkpoint-10"))
+  assert(Checkpoint.getCheckpointFiles(checkpointDir, Some(fs)).length === 
0)
+}
+  }
+
   test("SPARK-6847: stack overflow when updateStateByKey is followed by a 
checkpointed dstream") {
 // In this test, there are two updateStateByKey operators. The RDD DAG is 
as follows:
 //


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (89aba69 -> 6fb5ef1)

2019-09-06 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 89aba69  [SPARK-28935][SQL][DOCS] Document SQL metrics for Details for 
Query Plan
 add 6fb5ef1  [SPARK-29011][BUILD] Update netty-all from 4.1.30-Final to 
4.1.39-Final

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-2.7 | 2 +-
 dev/deps/spark-deps-hadoop-3.2 | 2 +-
 pom.xml| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (ff5fa58 -> 89aba69)

2019-09-06 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ff5fa58  [SPARK-21870][SQL][FOLLOW-UP] Clean up string template 
formats for generated code in HashAggregateExec
 add 89aba69  [SPARK-28935][SQL][DOCS] Document SQL metrics for Details for 
Query Plan

No new revisions were added by this update.

Summary of changes:
 docs/web-ui.md | 35 +++
 1 file changed, 35 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b2f0660 -> ff5fa58)

2019-09-06 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b2f0660  [SPARK-29002][SQL] Avoid changing SMJ to BHJ if the build 
side has a high ratio of empty partitions
 add ff5fa58  [SPARK-21870][SQL][FOLLOW-UP] Clean up string template 
formats for generated code in HashAggregateExec

No new revisions were added by this update.

Summary of changes:
 .../execution/aggregate/HashAggregateExec.scala| 102 ++---
 1 file changed, 49 insertions(+), 53 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (67b4329 -> b2f0660)

2019-09-06 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 67b4329  [SPARK-28690][SQL] Add `date_part` function for 
timestamps/dates
 add b2f0660  [SPARK-29002][SQL] Avoid changing SMJ to BHJ if the build 
side has a high ratio of empty partitions

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/plans/logical/hints.scala   |  8 +++
 .../org/apache/spark/sql/internal/SQLConf.scala| 12 +
 .../spark/sql/execution/SparkStrategies.scala  |  4 +-
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  4 +-
 .../adaptive/DemoteBroadcastHashJoin.scala | 60 ++
 .../adaptive/AdaptiveQueryExecSuite.scala  | 29 ++-
 6 files changed, 114 insertions(+), 3 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DemoteBroadcastHashJoin.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (905b7f7 -> 67b4329)

2019-09-06 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 905b7f7  [SPARK-28967][CORE] Include cloned version of "properties" to 
avoid ConcurrentModificationException
 add 67b4329  [SPARK-28690][SQL] Add `date_part` function for 
timestamps/dates

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/FunctionRegistry.scala   |   1 +
 .../catalyst/expressions/datetimeExpressions.scala |  87 +
 .../spark/sql/catalyst/parser/AstBuilder.scala |  48 +--
 .../test/resources/sql-tests/inputs/date_part.sql  |  68 
 .../resources/sql-tests/inputs/pgSQL/timestamp.sql |  31 +-
 .../resources/sql-tests/results/date_part.sql.out  | 412 +
 .../resources/sql-tests/results/extract.sql.out| 126 +++
 .../resources/sql-tests/results/pgSQL/date.sql.out |  52 +--
 .../sql-tests/results/pgSQL/timestamp.sql.out  |  55 ++-
 9 files changed, 727 insertions(+), 153 deletions(-)
 create mode 100644 sql/core/src/test/resources/sql-tests/inputs/date_part.sql
 create mode 100644 
sql/core/src/test/resources/sql-tests/results/date_part.sql.out


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (4664a08 -> 905b7f7)

2019-09-06 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4664a08  [SPARK-28968][ML] Add HasNumFeatures in the scala side
 add 905b7f7  [SPARK-28967][CORE] Include cloned version of "properties" to 
avoid ConcurrentModificationException

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/DAGScheduler.scala  |  2 +-
 .../apache/spark/scheduler/DAGSchedulerSuite.scala | 35 --
 2 files changed, 34 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org