[spark] branch branch-3.0-preview updated: [SPARK-29349][SQL] Support FETCH_PRIOR in Thriftserver fetch request

2019-10-15 Thread yumwang
This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch branch-3.0-preview
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0-preview by this 
push:
 new 8640b90  [SPARK-29349][SQL] Support FETCH_PRIOR in Thriftserver fetch 
request
8640b90 is described below

commit 8640b90a3e69c61b93afdee8a71180810ccf25e5
Author: Juliusz Sompolski 
AuthorDate: Tue Oct 15 23:22:19 2019 -0700

[SPARK-29349][SQL] Support FETCH_PRIOR in Thriftserver fetch request

### What changes were proposed in this pull request?

Support FETCH_PRIOR fetching in Thriftserver, and report correct fetch 
start offset it TFetchResultsResp.results.startRowOffset

The semantics of FETCH_PRIOR are as follow: Assuming the previous fetch 
returned a block of rows from offsets [10, 20)
* calling FETCH_PRIOR(maxRows=5) will scroll back and return rows [5, 10)
* calling FETCH_PRIOR(maxRows=10) again, will scroll back, but can't go 
earlier than 0. It will nevertheless return 10 rows, returning rows [0, 10) 
(overlapping with the previous fetch)
* calling FETCH_PRIOR(maxRows=4) again will again return rows starting from 
offset 0 - [0, 4)
* calling FETCH_NEXT(maxRows=6) after that will move the cursor forward and 
return rows [4, 10)

# Client/server backwards/forwards compatibility:

Old driver with new server:
* Drivers that don't support FETCH_PRIOR will not attempt to use it
* Field TFetchResultsResp.results.startRowOffset was not set, old drivers 
don't depend on it.

New driver with old server
* Using an older thriftserver with FETCH_PRIOR will make the thriftserver 
return unsupported operation error. The driver can then recognize that it's an 
old server.
* Older thriftserver will return 
TFetchResultsResp.results.startRowOffset=0. If the client driver receives 0, it 
can know that it can not rely on it as correct offset. If the client driver 
intentionally wants to fetch from 0, it can use FETCH_FIRST.

### Why are the changes needed?

It's intended to be used to recover after connection errors. If a client 
lost connection during fetching (e.g. of rows [10, 20)), and wants to reconnect 
and continue, it could not know whether the request  got lost before reaching 
the server, or on the response back. When it issued another FETCH_NEXT(10) 
request after reconnecting, because TFetchResultsResp.results.startRowOffset 
was not set, it could not know if the server will return rows [10,20) (because 
the previous request didn't [...]

Driver should always use FETCH_PRIOR after a broken connection.
* If the Thriftserver returns unsuported operation error, the driver knows 
that it's an old server that doesn't support it. The driver then must error the 
query, as it will also not support returning the correct startRowOffset, so the 
driver cannot reliably guarantee if it hadn't lost any rows on the fetch cursor.
* If the driver gets a response to FETCH_PRIOR, it should also have a 
correctly set startRowOffset, which the driver can use to position itself back 
where it left off before the connection broke.
* If FETCH_NEXT was used after a broken connection on the first fetch, and 
returned with an startRowOffset=0, then the client driver can't know if it's 0 
because it's the older server version, or if it's genuinely 0. Better to call 
FETCH_PRIOR, as scrolling back may anyway be possibly required after a broken 
connection.

This way it is implemented in a backwards/forwards compatible way, and 
doesn't require bumping the protocol version. FETCH_ABSOLUTE might have been 
better, but that would require a bigger protocol change, as there is currently 
no field to specify the requested absolute offset.

### Does this PR introduce any user-facing change?

ODBC/JDBC drivers connecting to Thriftserver may now implement using the 
FETCH_PRIOR fetch order to scroll back in query results, and check 
TFetchResultsResp.results.startRowOffset if their cursor position is consistent 
after connection errors.

### How was this patch tested?

Added tests to HiveThriftServer2Suites

Closes #26014 from juliuszsompolski/SPARK-29349.

Authored-by: Juliusz Sompolski 
Signed-off-by: Yuming Wang 
---
 .../SparkExecuteStatementOperation.scala   | 37 -
 .../thriftserver/HiveThriftServer2Suites.scala | 89 +-
 sql/hive-thriftserver/v1.2.1/if/TCLIService.thrift |  5 +-
 .../hive/service/cli/operation/Operation.java  |  5 +-
 sql/hive-thriftserver/v2.3.5/if/TCLIService.thrift |  5 +-
 .../hive/service/cli/operation/Operation.java  |  5 +-
 6 files changed, 134 insertions(+), 12 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 
b/sql/hive

[spark] branch master updated (57edb42 -> eb8c420)

2019-10-15 Thread yumwang
This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 57edb42  [SPARK-27259][CORE] Allow setting -1 as length for FileBlock
 add eb8c420  [SPARK-29349][SQL] Support FETCH_PRIOR in Thriftserver fetch 
request

No new revisions were added by this update.

Summary of changes:
 .../SparkExecuteStatementOperation.scala   | 37 -
 .../thriftserver/HiveThriftServer2Suites.scala | 89 +-
 sql/hive-thriftserver/v1.2.1/if/TCLIService.thrift |  5 +-
 .../hive/service/cli/operation/Operation.java  |  5 +-
 sql/hive-thriftserver/v2.3.5/if/TCLIService.thrift |  5 +-
 .../hive/service/cli/operation/Operation.java  |  5 +-
 6 files changed, 134 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (57edb42 -> eb8c420)

2019-10-15 Thread yumwang
This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 57edb42  [SPARK-27259][CORE] Allow setting -1 as length for FileBlock
 add eb8c420  [SPARK-29349][SQL] Support FETCH_PRIOR in Thriftserver fetch 
request

No new revisions were added by this update.

Summary of changes:
 .../SparkExecuteStatementOperation.scala   | 37 -
 .../thriftserver/HiveThriftServer2Suites.scala | 89 +-
 sql/hive-thriftserver/v1.2.1/if/TCLIService.thrift |  5 +-
 .../hive/service/cli/operation/Operation.java  |  5 +-
 sql/hive-thriftserver/v2.3.5/if/TCLIService.thrift |  5 +-
 .../hive/service/cli/operation/Operation.java  |  5 +-
 6 files changed, 134 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-27259][CORE] Allow setting -1 as length for FileBlock

2019-10-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 90139f6  [SPARK-27259][CORE] Allow setting -1 as length for FileBlock
90139f6 is described below

commit 90139f678860ce74b934a919b5bcd0635df348f4
Author: prasha2 
AuthorDate: Tue Oct 15 22:22:37 2019 -0700

[SPARK-27259][CORE] Allow setting -1 as length for FileBlock

### What changes were proposed in this pull request?

This PR aims to update the validation check on `length` from `length >= 0` 
to `length >= -1` in order to allow set `-1` to keep the default value.

### Why are the changes needed?

At Apache Spark 2.2.0, 
[SPARK-18702](https://github.com/apache/spark/pull/16133/files#diff-2c5519b1cf4308d77d6f12212971544fR27-R38)
 adds `class FileBlock` with the default `length` value, `-1`, initially.

There is no way to set `filePath` only while keeping `length` is `-1`.
```scala
  def set(filePath: String, startOffset: Long, length: Long): Unit = {
 require(filePath != null, "filePath cannot be null")
 require(startOffset >= 0, s"startOffset ($startOffset) cannot be 
negative")
 require(length >= 0, s"length ($length) cannot be negative")
 inputBlock.set(new FileBlock(UTF8String.fromString(filePath), 
startOffset, length))
   }
```

For compressed files (like GZ), the size of split can be set to -1. This 
was allowed till Spark 2.1 but regressed starting with spark 2.2.x. Please note 
that split length of -1 also means the length was unknown - a valid scenario. 
Thus, split length of -1 should be acceptable like pre Spark 2.2.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

This is updating the corner case on the requirement check. Manually check 
the code.

Closes #26123 from praneetsharma/fix-SPARK-27259.

Authored-by: prasha2 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 57edb4258254fa582f8aae6bfd8bed1069e8155c)
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala 
b/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala
index bfe8152..1beb085 100644
--- a/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala
@@ -76,7 +76,7 @@ private[spark] object InputFileBlockHolder {
   def set(filePath: String, startOffset: Long, length: Long): Unit = {
 require(filePath != null, "filePath cannot be null")
 require(startOffset >= 0, s"startOffset ($startOffset) cannot be negative")
-require(length >= 0, s"length ($length) cannot be negative")
+require(length >= -1, s"length ($length) cannot be smaller than -1")
 inputBlock.get().set(new FileBlock(UTF8String.fromString(filePath), 
startOffset, length))
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0-preview updated: [SPARK-27259][CORE] Allow setting -1 as length for FileBlock

2019-10-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0-preview
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0-preview by this 
push:
 new 931cc1b  [SPARK-27259][CORE] Allow setting -1 as length for FileBlock
931cc1b is described below

commit 931cc1ba068f64a835264c1c8fc3431ecd4e31a0
Author: prasha2 
AuthorDate: Tue Oct 15 22:22:37 2019 -0700

[SPARK-27259][CORE] Allow setting -1 as length for FileBlock

### What changes were proposed in this pull request?

This PR aims to update the validation check on `length` from `length >= 0` 
to `length >= -1` in order to allow set `-1` to keep the default value.

### Why are the changes needed?

At Apache Spark 2.2.0, 
[SPARK-18702](https://github.com/apache/spark/pull/16133/files#diff-2c5519b1cf4308d77d6f12212971544fR27-R38)
 adds `class FileBlock` with the default `length` value, `-1`, initially.

There is no way to set `filePath` only while keeping `length` is `-1`.
```scala
  def set(filePath: String, startOffset: Long, length: Long): Unit = {
 require(filePath != null, "filePath cannot be null")
 require(startOffset >= 0, s"startOffset ($startOffset) cannot be 
negative")
 require(length >= 0, s"length ($length) cannot be negative")
 inputBlock.set(new FileBlock(UTF8String.fromString(filePath), 
startOffset, length))
   }
```

For compressed files (like GZ), the size of split can be set to -1. This 
was allowed till Spark 2.1 but regressed starting with spark 2.2.x. Please note 
that split length of -1 also means the length was unknown - a valid scenario. 
Thus, split length of -1 should be acceptable like pre Spark 2.2.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

This is updating the corner case on the requirement check. Manually check 
the code.

Closes #26123 from praneetsharma/fix-SPARK-27259.

Authored-by: prasha2 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 57edb4258254fa582f8aae6bfd8bed1069e8155c)
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala 
b/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala
index bfe8152..1beb085 100644
--- a/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala
@@ -76,7 +76,7 @@ private[spark] object InputFileBlockHolder {
   def set(filePath: String, startOffset: Long, length: Long): Unit = {
 require(filePath != null, "filePath cannot be null")
 require(startOffset >= 0, s"startOffset ($startOffset) cannot be negative")
-require(length >= 0, s"length ($length) cannot be negative")
+require(length >= -1, s"length ($length) cannot be smaller than -1")
 inputBlock.get().set(new FileBlock(UTF8String.fromString(filePath), 
startOffset, length))
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-27259][CORE] Allow setting -1 as length for FileBlock

2019-10-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 57edb42  [SPARK-27259][CORE] Allow setting -1 as length for FileBlock
57edb42 is described below

commit 57edb4258254fa582f8aae6bfd8bed1069e8155c
Author: prasha2 
AuthorDate: Tue Oct 15 22:22:37 2019 -0700

[SPARK-27259][CORE] Allow setting -1 as length for FileBlock

### What changes were proposed in this pull request?

This PR aims to update the validation check on `length` from `length >= 0` 
to `length >= -1` in order to allow set `-1` to keep the default value.

### Why are the changes needed?

At Apache Spark 2.2.0, 
[SPARK-18702](https://github.com/apache/spark/pull/16133/files#diff-2c5519b1cf4308d77d6f12212971544fR27-R38)
 adds `class FileBlock` with the default `length` value, `-1`, initially.

There is no way to set `filePath` only while keeping `length` is `-1`.
```scala
  def set(filePath: String, startOffset: Long, length: Long): Unit = {
 require(filePath != null, "filePath cannot be null")
 require(startOffset >= 0, s"startOffset ($startOffset) cannot be 
negative")
 require(length >= 0, s"length ($length) cannot be negative")
 inputBlock.set(new FileBlock(UTF8String.fromString(filePath), 
startOffset, length))
   }
```

For compressed files (like GZ), the size of split can be set to -1. This 
was allowed till Spark 2.1 but regressed starting with spark 2.2.x. Please note 
that split length of -1 also means the length was unknown - a valid scenario. 
Thus, split length of -1 should be acceptable like pre Spark 2.2.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

This is updating the corner case on the requirement check. Manually check 
the code.

Closes #26123 from praneetsharma/fix-SPARK-27259.

Authored-by: prasha2 
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala 
b/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala
index bfe8152..1beb085 100644
--- a/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala
@@ -76,7 +76,7 @@ private[spark] object InputFileBlockHolder {
   def set(filePath: String, startOffset: Long, length: Long): Unit = {
 require(filePath != null, "filePath cannot be null")
 require(startOffset >= 0, s"startOffset ($startOffset) cannot be negative")
-require(length >= 0, s"length ($length) cannot be negative")
+require(length >= -1, s"length ($length) cannot be smaller than -1")
 inputBlock.get().set(new FileBlock(UTF8String.fromString(filePath), 
startOffset, length))
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (e00344e -> 93e71e6)

2019-10-15 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e00344e  [SPARK-29423][SS] lazily initialize StreamingQueryManager in 
SessionState
 add 93e71e6  [SPARK-29469][SHUFFLE] Avoid retries by RetryingBlockFetcher 
when ExternalBlockStoreClient is closed

No new revisions were added by this update.

Summary of changes:
 .../spark/network/shuffle/ExternalBlockStoreClient.java | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (51f10ed -> e00344e)

2019-10-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 51f10ed  [SPARK-28560][SQL][FOLLOWUP] code cleanup for local shuffle 
reader
 add e00344e  [SPARK-29423][SS] lazily initialize StreamingQueryManager in 
SessionState

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/BaseSessionStateBuilder.scala  | 2 +-
 .../main/scala/org/apache/spark/sql/internal/SessionState.scala  | 9 +++--
 2 files changed, 8 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (95de93b -> 51f10ed)

2019-10-15 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 95de93b  [SPARK-24540][SQL] Support for multiple character delimiter 
in Spark CSV read
 add 51f10ed  [SPARK-28560][SQL][FOLLOWUP] code cleanup for local shuffle 
reader

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/MapOutputTracker.scala  | 34 +++---
 .../spark/shuffle/BlockStoreShuffleReader.scala| 22 ++
 .../org/apache/spark/shuffle/ShuffleManager.scala  | 11 ---
 .../spark/shuffle/sort/SortShuffleManager.scala| 25 ++--
 .../shuffle/BlockStoreShuffleReaderSuite.scala |  8 ++---
 .../execution/adaptive/LocalShuffledRowRDD.scala   | 14 -
 6 files changed, 45 insertions(+), 69 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (95de93b -> 51f10ed)

2019-10-15 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 95de93b  [SPARK-24540][SQL] Support for multiple character delimiter 
in Spark CSV read
 add 51f10ed  [SPARK-28560][SQL][FOLLOWUP] code cleanup for local shuffle 
reader

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/MapOutputTracker.scala  | 34 +++---
 .../spark/shuffle/BlockStoreShuffleReader.scala| 22 ++
 .../org/apache/spark/shuffle/ShuffleManager.scala  | 11 ---
 .../spark/shuffle/sort/SortShuffleManager.scala| 25 ++--
 .../shuffle/BlockStoreShuffleReaderSuite.scala |  8 ++---
 .../execution/adaptive/LocalShuffledRowRDD.scala   | 14 -
 6 files changed, 45 insertions(+), 69 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (02c5b4f -> 95de93b)

2019-10-15 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 02c5b4f  [SPARK-28947][K8S] Status logging not happens at an interval 
for liveness
 add 95de93b  [SPARK-24540][SQL] Support for multiple character delimiter 
in Spark CSV read

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-2.7 |  2 +-
 dev/deps/spark-deps-hadoop-3.2 |  2 +-
 pom.xml|  5 +++
 python/pyspark/sql/readwriter.py   |  6 +--
 python/pyspark/sql/streaming.py|  4 +-
 sql/catalyst/pom.xml   |  1 -
 .../spark/sql/catalyst/csv/CSVExprUtils.scala  | 46 ++
 .../apache/spark/sql/catalyst/csv/CSVOptions.scala |  2 +-
 .../spark/sql/catalyst/csv/CSVExprUtilsSuite.scala | 38 ++
 sql/core/pom.xml   |  1 -
 .../org/apache/spark/sql/DataFrameReader.scala |  4 +-
 .../test-data/cars-multichar-delim-crazy.csv   |  4 ++
 .../resources/test-data/cars-multichar-delim.csv   |  4 ++
 .../sql/execution/datasources/csv/CSVSuite.scala   | 45 +
 14 files changed, 152 insertions(+), 12 deletions(-)
 create mode 100644 
sql/core/src/test/resources/test-data/cars-multichar-delim-crazy.csv
 create mode 100644 
sql/core/src/test/resources/test-data/cars-multichar-delim.csv


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0-preview created (now 02c5b4f)

2019-10-15 Thread jiangxb1987
This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch branch-3.0-preview
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 02c5b4f  [SPARK-28947][K8S] Status logging not happens at an interval 
for liveness

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-28947][K8S] Status logging not happens at an interval for liveness

2019-10-15 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 02c5b4f  [SPARK-28947][K8S] Status logging not happens at an interval 
for liveness
02c5b4f is described below

commit 02c5b4f76337cc3901b8741887292bb4478931f3
Author: Kent Yao 
AuthorDate: Tue Oct 15 12:34:39 2019 -0700

[SPARK-28947][K8S] Status logging not happens at an interval for liveness

### What changes were proposed in this pull request?

This pr invoke the start method of `LoggingPodStatusWatcherImpl` for status 
logging at intervals.

### Why are the changes needed?

This pr invoke the start method of `LoggingPodStatusWatcherImpl` is 
declared but never called

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

manually test

Closes #25648 from yaooqinn/SPARK-28947.

Authored-by: Kent Yao 
Signed-off-by: Marcelo Vanzin 
---
 .../k8s/submit/KubernetesClientApplication.scala   | 25 ++---
 .../k8s/submit/LoggingPodStatusWatcher.scala   | 61 ++
 .../spark/deploy/k8s/submit/ClientSuite.scala  |  5 +-
 3 files changed, 33 insertions(+), 58 deletions(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
index 11bbad9..8e5532d 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
@@ -86,15 +86,12 @@ private[spark] object ClientArguments {
  * @param builder Responsible for building the base driver pod based on a 
composition of
  *implemented features.
  * @param kubernetesClient the client to talk to the Kubernetes API server
- * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
- * to complete
  * @param watcher a watcher that monitors and logs the application status
  */
 private[spark] class Client(
 conf: KubernetesDriverConf,
 builder: KubernetesDriverBuilder,
 kubernetesClient: KubernetesClient,
-waitForAppCompletion: Boolean,
 watcher: LoggingPodStatusWatcher) extends Logging {
 
   def run(): Unit = {
@@ -124,10 +121,11 @@ private[spark] class Client(
   .endVolume()
 .endSpec()
   .build()
+val driverPodName = resolvedDriverPod.getMetadata.getName
 Utils.tryWithResource(
   kubernetesClient
 .pods()
-.withName(resolvedDriverPod.getMetadata.getName)
+.withName(driverPodName)
 .watch(watcher)) { _ =>
   val createdDriverPod = kubernetesClient.pods().create(resolvedDriverPod)
   try {
@@ -141,16 +139,8 @@ private[spark] class Client(
   throw e
   }
 
-  val sId = s"${Option(conf.namespace).map(_ + ":").getOrElse("")}" +
-s"${resolvedDriverPod.getMetadata.getName}"
-  if (waitForAppCompletion) {
-logInfo(s"Waiting for application ${conf.appName} with submission ID 
${sId} to finish...")
-watcher.awaitCompletion()
-logInfo(s"Application ${conf.appName} with submission ID ${sId} 
finished.")
-  } else {
-logInfo(s"Deployed Spark application ${conf.appName} with " +
-  s"submission ID ${sId} into Kubernetes.")
-  }
+  val sId = Seq(conf.namespace, driverPodName).mkString(":")
+  watcher.watchOrStop(sId)
 }
   }
 
@@ -199,13 +189,11 @@ private[spark] class KubernetesClientApplication extends 
SparkApplication {
   }
 
   private def run(clientArguments: ClientArguments, sparkConf: SparkConf): 
Unit = {
-val appName = sparkConf.getOption("spark.app.name").getOrElse("spark")
 // For constructing the app ID, we can't use the Spark application name, 
as the app ID is going
 // to be added as a label to group resources belonging to the same 
application. Label values are
 // considerably restrictive, e.g. must be no longer than 63 characters in 
length. So we generate
 // a unique app ID (captured by spark.app.id) in the format below.
 val kubernetesAppId = s"spark-${UUID.randomUUID().toString.replaceAll("-", 
"")}"
-val waitForAppCompletion = sparkConf.get(WAIT_FOR_APP_COMPLETION)
 val kubernetesConf = KubernetesConf.createDriverConf(
   sparkConf,
   kubernetesAppId,
@@ -215,9 +203,7 @@ private[spark] class KubernetesClientApplication extends 
SparkApplication {
 // The master URL has been checked for validity already in SparkSubmit.
 // We just need to get rid of the "k8s://" prefix here.

[spark] branch master updated (4ecbdbb -> 39d53d3)

2019-10-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4ecbdbb  [SPARK-29182][CORE] Cache preferred locations of checkpointed 
RDD
 add 39d53d3  [SPARK-29470][BUILD] Update plugins to latest versions

No new revisions were added by this update.

Summary of changes:
 .../main/resources/org/apache/spark/log4j-defaults.properties| 3 ++-
 dev/checkstyle.xml   | 9 +
 pom.xml  | 8 
 project/plugins.sbt  | 2 +-
 4 files changed, 12 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (4ecbdbb -> 39d53d3)

2019-10-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4ecbdbb  [SPARK-29182][CORE] Cache preferred locations of checkpointed 
RDD
 add 39d53d3  [SPARK-29470][BUILD] Update plugins to latest versions

No new revisions were added by this update.

Summary of changes:
 .../main/resources/org/apache/spark/log4j-defaults.properties| 3 ++-
 dev/checkstyle.xml   | 9 +
 pom.xml  | 8 
 project/plugins.sbt  | 2 +-
 4 files changed, 12 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (322ec0b -> 4ecbdbb)

2019-10-15 Thread viirya
This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 322ec0b  [SPARK-28885][SQL] Follow ANSI store assignment rules in 
table insertion by default
 add 4ecbdbb  [SPARK-29182][CORE] Cache preferred locations of checkpointed 
RDD

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/internal/config/package.scala | 11 +++
 .../apache/spark/rdd/ReliableCheckpointRDD.scala   | 36 +++---
 .../scala/org/apache/spark/CheckpointSuite.scala   | 26 +++-
 3 files changed, 67 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (2e28622 -> 322ec0b)

2019-10-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2e28622  [SPARK-28211][CORE][SHUFFLE] Propose Shuffle Driver 
Components API
 add 322ec0b  [SPARK-28885][SQL] Follow ANSI store assignment rules in 
table insertion by default

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md|  1 +
 .../spark/sql/catalyst/analysis/Analyzer.scala | 18 
 .../catalyst/analysis/TableOutputResolver.scala| 11 --
 .../spark/sql/catalyst/expressions/Cast.scala  |  1 +
 .../org/apache/spark/sql/internal/SQLConf.scala|  6 +++---
 .../org/apache/spark/sql/types/DataType.scala  |  2 ++
 .../types/DataTypeWriteCompatibilitySuite.scala| 24 ++
 .../spark/sql/execution/datasources/rules.scala| 10 ++---
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   | 14 -
 .../spark/sql/execution/command/DDLSuite.scala |  4 ++--
 .../execution/datasources/orc/OrcSourceSuite.scala |  4 +++-
 .../datasources/parquet/ParquetQuerySuite.scala| 16 +++
 .../org/apache/spark/sql/sources/InsertSuite.scala |  5 -
 .../thriftserver/ThriftServerQueryTestSuite.scala  | 11 ++
 .../hive/execution/HiveCompatibilitySuite.scala|  3 +++
 .../spark/sql/hive/client/VersionsSuite.scala  |  2 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala|  4 ++--
 .../hive/execution/HiveSerDeReadWriteSuite.scala   |  8 +---
 .../spark/sql/hive/orc/HiveOrcQuerySuite.scala |  5 -
 19 files changed, 92 insertions(+), 57 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (2e28622 -> 322ec0b)

2019-10-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2e28622  [SPARK-28211][CORE][SHUFFLE] Propose Shuffle Driver 
Components API
 add 322ec0b  [SPARK-28885][SQL] Follow ANSI store assignment rules in 
table insertion by default

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md|  1 +
 .../spark/sql/catalyst/analysis/Analyzer.scala | 18 
 .../catalyst/analysis/TableOutputResolver.scala| 11 --
 .../spark/sql/catalyst/expressions/Cast.scala  |  1 +
 .../org/apache/spark/sql/internal/SQLConf.scala|  6 +++---
 .../org/apache/spark/sql/types/DataType.scala  |  2 ++
 .../types/DataTypeWriteCompatibilitySuite.scala| 24 ++
 .../spark/sql/execution/datasources/rules.scala| 10 ++---
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   | 14 -
 .../spark/sql/execution/command/DDLSuite.scala |  4 ++--
 .../execution/datasources/orc/OrcSourceSuite.scala |  4 +++-
 .../datasources/parquet/ParquetQuerySuite.scala| 16 +++
 .../org/apache/spark/sql/sources/InsertSuite.scala |  5 -
 .../thriftserver/ThriftServerQueryTestSuite.scala  | 11 ++
 .../hive/execution/HiveCompatibilitySuite.scala|  3 +++
 .../spark/sql/hive/client/VersionsSuite.scala  |  2 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala|  4 ++--
 .../hive/execution/HiveSerDeReadWriteSuite.scala   |  8 +---
 .../spark/sql/hive/orc/HiveOrcQuerySuite.scala |  5 -
 19 files changed, 92 insertions(+), 57 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-28211][CORE][SHUFFLE] Propose Shuffle Driver Components API

2019-10-15 Thread irashid
This is an automated email from the ASF dual-hosted git repository.

irashid pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2e28622  [SPARK-28211][CORE][SHUFFLE] Propose Shuffle Driver 
Components API
2e28622 is described below

commit 2e28622d8aeb9ce2460e803bb7d994196bcc0253
Author: Yifei Huang 
AuthorDate: Tue Oct 15 12:26:49 2019 -0500

[SPARK-28211][CORE][SHUFFLE] Propose Shuffle Driver Components API

### What changes were proposed in this pull request?

This is the next step of the Spark-25299 work of proposing a new Shuffle 
storage API. This patch includes the components of the plugin that hook into 
the driver, including driver shuffle initialization, application cleanup, and 
shuffle cleanup.

### How was this patch tested?
Existing unit tests, plus an additional test for testing the interactions 
between the driver and executor initialization.

Closes #25823 from yifeih/yh/upstream/driver-lifecycle.

Lead-authored-by: Yifei Huang 
Co-authored-by: mccheah 
Signed-off-by: Imran Rashid 
---
 .../apache/spark/shuffle/api/ShuffleDataIO.java|  6 ++
 .../spark/shuffle/api/ShuffleDriverComponents.java | 64 +++
 .../shuffle/api/ShuffleExecutorComponents.java | 12 ++-
 .../shuffle/sort/io/LocalDiskShuffleDataIO.java|  8 +-
 java => LocalDiskShuffleDriverComponents.java} | 35 +---
 .../io/LocalDiskShuffleExecutorComponents.java |  7 +-
 .../scala/org/apache/spark/ContextCleaner.scala|  8 +-
 .../main/scala/org/apache/spark/Dependency.scala   |  1 +
 .../main/scala/org/apache/spark/SparkContext.scala | 17 +++-
 .../apache/spark/shuffle/ShuffleDataIOUtils.scala  | 42 ++
 .../spark/shuffle/sort/SortShuffleManager.scala| 15 ++--
 .../apache/spark/InternalAccumulatorSuite.scala|  3 +-
 .../shuffle/ShuffleDriverComponentsSuite.scala | 94 ++
 13 files changed, 281 insertions(+), 31 deletions(-)

diff --git a/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDataIO.java 
b/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDataIO.java
index e9e50ec..e4554bd 100644
--- a/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDataIO.java
+++ b/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDataIO.java
@@ -46,4 +46,10 @@ public interface ShuffleDataIO {
* are only invoked on the executors.
*/
   ShuffleExecutorComponents executor();
+
+  /**
+   * Called once on driver process to bootstrap the shuffle metadata modules 
that
+   * are maintained by the driver.
+   */
+  ShuffleDriverComponents driver();
 }
diff --git 
a/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDriverComponents.java 
b/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDriverComponents.java
new file mode 100644
index 000..b4cec17
--- /dev/null
+++ 
b/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDriverComponents.java
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.shuffle.api;
+
+import java.util.Map;
+
+import org.apache.spark.annotation.Private;
+
+/**
+ * :: Private ::
+ * An interface for building shuffle support modules for the Driver.
+ */
+@Private
+public interface ShuffleDriverComponents {
+
+  /**
+   * Called once in the driver to bootstrap this module that is specific to 
this application.
+   * This method is called before submitting executor requests to the cluster 
manager.
+   *
+   * This method should prepare the module with its shuffle components i.e. 
registering against
+   * an external file servers or shuffle services, or creating tables in a 
shuffle
+   * storage data database.
+   *
+   * @return additional SparkConf settings necessary for initializing the 
executor components.
+   * This would include configurations that cannot be statically set on the 
application, like
+   * the host:port of external services for shuffle storage.
+   */
+  Map initializeApplication();
+
+  /**
+   * Called once at the end of the Spark application to clean up any existing 
shuffle state.
+   */
+  void cleanupApplication();
+
+  /

[spark] branch master updated (8915966 -> 9ac4b2d)

2019-10-15 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8915966  [SPARK-29473][SQL] move statement logical plans to a new file
 add 9ac4b2d  [SPARK-28560][SQL] Optimize shuffle reader to local shuffle 
reader when smj converted to bhj in adaptive execution

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/MapOutputTracker.scala  |  91 +-
 .../spark/shuffle/BlockStoreShuffleReader.scala|  19 ++-
 .../org/apache/spark/shuffle/ShuffleManager.scala  |  13 ++
 .../spark/shuffle/sort/SortShuffleManager.scala|  21 
 .../org/apache/spark/sql/internal/SQLConf.scala|   8 ++
 .../apache/spark/sql/execution/SparkPlanInfo.scala |   3 +-
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |   1 +
 .../adaptive/AdaptiveSparkPlanHelper.scala |   1 +
 .../execution/adaptive/LocalShuffledRowRDD.scala   |  98 +++
 .../adaptive/OptimizeLocalShuffleReader.scala  | 132 +
 .../execution/exchange/ShuffleExchangeExec.scala   |   5 +
 .../adaptive/AdaptiveQueryExecSuite.scala  |  46 +--
 12 files changed, 424 insertions(+), 14 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/LocalShuffledRowRDD.scala
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeLocalShuffleReader.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (a988aaf -> 8915966)

2019-10-15 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a988aaf  [SPARK-29454][SQL] Reduce unsafeProjection times when read 
Parquet file use non-vectorized mode
 add 8915966  [SPARK-29473][SQL] move statement logical plans to a new file

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala |   1 -
 .../sql/catalyst/analysis/CheckAnalysis.scala  |   1 -
 .../sql/catalyst/analysis/ResolveCatalogs.scala|   1 -
 .../apache/spark/sql/catalyst/dsl/package.scala|   1 -
 .../spark/sql/catalyst/parser/AstBuilder.scala |   1 -
 .../plans/logical/sql/AlterTableStatements.scala   |  78 --
 .../plans/logical/sql/AlterViewStatements.scala|  33 ---
 .../plans/logical/sql/CreateTableStatement.scala   |  58 
 .../plans/logical/sql/DeleteFromStatement.scala|  27 --
 .../logical/sql/DescribeColumnStatement.scala  |  23 --
 .../plans/logical/sql/DescribeTableStatement.scala |  25 --
 .../plans/logical/sql/DropTableStatement.scala |  34 ---
 .../plans/logical/sql/DropViewStatement.scala  |  33 ---
 .../plans/logical/sql/InsertIntoStatement.scala|  50 
 .../plans/logical/sql/ParsedStatement.scala|  49 
 .../plans/logical/sql/ReplaceTableStatement.scala  |  60 -
 .../logical/sql/ShowNamespacesStatement.scala  |  24 --
 .../plans/logical/sql/ShowTablesStatement.scala|  24 --
 .../plans/logical/sql/UpdateTableStatement.scala   |  27 --
 .../catalyst/plans/logical/sql/UseStatement.scala  |  23 --
 .../sql/catalyst/plans/logical/statements.scala| 294 +
 .../spark/sql/catalyst/parser/DDLParserSuite.scala |   3 +-
 .../sql/catalyst/parser/PlanParserSuite.scala  |   1 -
 .../org/apache/spark/sql/DataFrameWriter.scala |   3 +-
 .../catalyst/analysis/ResolveSessionCatalog.scala  |   5 +-
 .../execution/datasources/DataSourceStrategy.scala |   3 +-
 .../datasources/FallBackFileSourceV2.scala |   3 +-
 .../spark/sql/execution/datasources/rules.scala|   1 -
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   |   1 -
 .../spark/sql/execution/SparkSqlParserSuite.scala  |   3 +-
 .../spark/sql/util/DataFrameCallbackSuite.scala|   3 +-
 .../org/apache/spark/sql/hive/HiveStrategies.scala |   3 +-
 .../sql/hive/execution/HiveComparisonTest.scala|   1 -
 33 files changed, 303 insertions(+), 594 deletions(-)
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/AlterTableStatements.scala
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/AlterViewStatements.scala
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/CreateTableStatement.scala
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/DeleteFromStatement.scala
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/DescribeColumnStatement.scala
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/DescribeTableStatement.scala
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/DropTableStatement.scala
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/DropViewStatement.scala
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/InsertIntoStatement.scala
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ParsedStatement.scala
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ReplaceTableStatement.scala
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ShowNamespacesStatement.scala
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ShowTablesStatement.scala
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/UpdateTableStatement.scala
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/UseStatement.scala
 create mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org