[spark] branch branch-3.0-preview updated: [SPARK-29349][SQL] Support FETCH_PRIOR in Thriftserver fetch request
This is an automated email from the ASF dual-hosted git repository. yumwang pushed a commit to branch branch-3.0-preview in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0-preview by this push: new 8640b90 [SPARK-29349][SQL] Support FETCH_PRIOR in Thriftserver fetch request 8640b90 is described below commit 8640b90a3e69c61b93afdee8a71180810ccf25e5 Author: Juliusz Sompolski AuthorDate: Tue Oct 15 23:22:19 2019 -0700 [SPARK-29349][SQL] Support FETCH_PRIOR in Thriftserver fetch request ### What changes were proposed in this pull request? Support FETCH_PRIOR fetching in Thriftserver, and report correct fetch start offset it TFetchResultsResp.results.startRowOffset The semantics of FETCH_PRIOR are as follow: Assuming the previous fetch returned a block of rows from offsets [10, 20) * calling FETCH_PRIOR(maxRows=5) will scroll back and return rows [5, 10) * calling FETCH_PRIOR(maxRows=10) again, will scroll back, but can't go earlier than 0. It will nevertheless return 10 rows, returning rows [0, 10) (overlapping with the previous fetch) * calling FETCH_PRIOR(maxRows=4) again will again return rows starting from offset 0 - [0, 4) * calling FETCH_NEXT(maxRows=6) after that will move the cursor forward and return rows [4, 10) # Client/server backwards/forwards compatibility: Old driver with new server: * Drivers that don't support FETCH_PRIOR will not attempt to use it * Field TFetchResultsResp.results.startRowOffset was not set, old drivers don't depend on it. New driver with old server * Using an older thriftserver with FETCH_PRIOR will make the thriftserver return unsupported operation error. The driver can then recognize that it's an old server. * Older thriftserver will return TFetchResultsResp.results.startRowOffset=0. If the client driver receives 0, it can know that it can not rely on it as correct offset. If the client driver intentionally wants to fetch from 0, it can use FETCH_FIRST. ### Why are the changes needed? It's intended to be used to recover after connection errors. If a client lost connection during fetching (e.g. of rows [10, 20)), and wants to reconnect and continue, it could not know whether the request got lost before reaching the server, or on the response back. When it issued another FETCH_NEXT(10) request after reconnecting, because TFetchResultsResp.results.startRowOffset was not set, it could not know if the server will return rows [10,20) (because the previous request didn't [...] Driver should always use FETCH_PRIOR after a broken connection. * If the Thriftserver returns unsuported operation error, the driver knows that it's an old server that doesn't support it. The driver then must error the query, as it will also not support returning the correct startRowOffset, so the driver cannot reliably guarantee if it hadn't lost any rows on the fetch cursor. * If the driver gets a response to FETCH_PRIOR, it should also have a correctly set startRowOffset, which the driver can use to position itself back where it left off before the connection broke. * If FETCH_NEXT was used after a broken connection on the first fetch, and returned with an startRowOffset=0, then the client driver can't know if it's 0 because it's the older server version, or if it's genuinely 0. Better to call FETCH_PRIOR, as scrolling back may anyway be possibly required after a broken connection. This way it is implemented in a backwards/forwards compatible way, and doesn't require bumping the protocol version. FETCH_ABSOLUTE might have been better, but that would require a bigger protocol change, as there is currently no field to specify the requested absolute offset. ### Does this PR introduce any user-facing change? ODBC/JDBC drivers connecting to Thriftserver may now implement using the FETCH_PRIOR fetch order to scroll back in query results, and check TFetchResultsResp.results.startRowOffset if their cursor position is consistent after connection errors. ### How was this patch tested? Added tests to HiveThriftServer2Suites Closes #26014 from juliuszsompolski/SPARK-29349. Authored-by: Juliusz Sompolski Signed-off-by: Yuming Wang --- .../SparkExecuteStatementOperation.scala | 37 - .../thriftserver/HiveThriftServer2Suites.scala | 89 +- sql/hive-thriftserver/v1.2.1/if/TCLIService.thrift | 5 +- .../hive/service/cli/operation/Operation.java | 5 +- sql/hive-thriftserver/v2.3.5/if/TCLIService.thrift | 5 +- .../hive/service/cli/operation/Operation.java | 5 +- 6 files changed, 134 insertions(+), 12 deletions(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala b/sql/hive
[spark] branch master updated (57edb42 -> eb8c420)
This is an automated email from the ASF dual-hosted git repository. yumwang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 57edb42 [SPARK-27259][CORE] Allow setting -1 as length for FileBlock add eb8c420 [SPARK-29349][SQL] Support FETCH_PRIOR in Thriftserver fetch request No new revisions were added by this update. Summary of changes: .../SparkExecuteStatementOperation.scala | 37 - .../thriftserver/HiveThriftServer2Suites.scala | 89 +- sql/hive-thriftserver/v1.2.1/if/TCLIService.thrift | 5 +- .../hive/service/cli/operation/Operation.java | 5 +- sql/hive-thriftserver/v2.3.5/if/TCLIService.thrift | 5 +- .../hive/service/cli/operation/Operation.java | 5 +- 6 files changed, 134 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (57edb42 -> eb8c420)
This is an automated email from the ASF dual-hosted git repository. yumwang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 57edb42 [SPARK-27259][CORE] Allow setting -1 as length for FileBlock add eb8c420 [SPARK-29349][SQL] Support FETCH_PRIOR in Thriftserver fetch request No new revisions were added by this update. Summary of changes: .../SparkExecuteStatementOperation.scala | 37 - .../thriftserver/HiveThriftServer2Suites.scala | 89 +- sql/hive-thriftserver/v1.2.1/if/TCLIService.thrift | 5 +- .../hive/service/cli/operation/Operation.java | 5 +- sql/hive-thriftserver/v2.3.5/if/TCLIService.thrift | 5 +- .../hive/service/cli/operation/Operation.java | 5 +- 6 files changed, 134 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-27259][CORE] Allow setting -1 as length for FileBlock
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 90139f6 [SPARK-27259][CORE] Allow setting -1 as length for FileBlock 90139f6 is described below commit 90139f678860ce74b934a919b5bcd0635df348f4 Author: prasha2 AuthorDate: Tue Oct 15 22:22:37 2019 -0700 [SPARK-27259][CORE] Allow setting -1 as length for FileBlock ### What changes were proposed in this pull request? This PR aims to update the validation check on `length` from `length >= 0` to `length >= -1` in order to allow set `-1` to keep the default value. ### Why are the changes needed? At Apache Spark 2.2.0, [SPARK-18702](https://github.com/apache/spark/pull/16133/files#diff-2c5519b1cf4308d77d6f12212971544fR27-R38) adds `class FileBlock` with the default `length` value, `-1`, initially. There is no way to set `filePath` only while keeping `length` is `-1`. ```scala def set(filePath: String, startOffset: Long, length: Long): Unit = { require(filePath != null, "filePath cannot be null") require(startOffset >= 0, s"startOffset ($startOffset) cannot be negative") require(length >= 0, s"length ($length) cannot be negative") inputBlock.set(new FileBlock(UTF8String.fromString(filePath), startOffset, length)) } ``` For compressed files (like GZ), the size of split can be set to -1. This was allowed till Spark 2.1 but regressed starting with spark 2.2.x. Please note that split length of -1 also means the length was unknown - a valid scenario. Thus, split length of -1 should be acceptable like pre Spark 2.2. ### Does this PR introduce any user-facing change? No ### How was this patch tested? This is updating the corner case on the requirement check. Manually check the code. Closes #26123 from praneetsharma/fix-SPARK-27259. Authored-by: prasha2 Signed-off-by: Dongjoon Hyun (cherry picked from commit 57edb4258254fa582f8aae6bfd8bed1069e8155c) Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala b/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala index bfe8152..1beb085 100644 --- a/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala +++ b/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala @@ -76,7 +76,7 @@ private[spark] object InputFileBlockHolder { def set(filePath: String, startOffset: Long, length: Long): Unit = { require(filePath != null, "filePath cannot be null") require(startOffset >= 0, s"startOffset ($startOffset) cannot be negative") -require(length >= 0, s"length ($length) cannot be negative") +require(length >= -1, s"length ($length) cannot be smaller than -1") inputBlock.get().set(new FileBlock(UTF8String.fromString(filePath), startOffset, length)) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0-preview updated: [SPARK-27259][CORE] Allow setting -1 as length for FileBlock
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0-preview in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0-preview by this push: new 931cc1b [SPARK-27259][CORE] Allow setting -1 as length for FileBlock 931cc1b is described below commit 931cc1ba068f64a835264c1c8fc3431ecd4e31a0 Author: prasha2 AuthorDate: Tue Oct 15 22:22:37 2019 -0700 [SPARK-27259][CORE] Allow setting -1 as length for FileBlock ### What changes were proposed in this pull request? This PR aims to update the validation check on `length` from `length >= 0` to `length >= -1` in order to allow set `-1` to keep the default value. ### Why are the changes needed? At Apache Spark 2.2.0, [SPARK-18702](https://github.com/apache/spark/pull/16133/files#diff-2c5519b1cf4308d77d6f12212971544fR27-R38) adds `class FileBlock` with the default `length` value, `-1`, initially. There is no way to set `filePath` only while keeping `length` is `-1`. ```scala def set(filePath: String, startOffset: Long, length: Long): Unit = { require(filePath != null, "filePath cannot be null") require(startOffset >= 0, s"startOffset ($startOffset) cannot be negative") require(length >= 0, s"length ($length) cannot be negative") inputBlock.set(new FileBlock(UTF8String.fromString(filePath), startOffset, length)) } ``` For compressed files (like GZ), the size of split can be set to -1. This was allowed till Spark 2.1 but regressed starting with spark 2.2.x. Please note that split length of -1 also means the length was unknown - a valid scenario. Thus, split length of -1 should be acceptable like pre Spark 2.2. ### Does this PR introduce any user-facing change? No ### How was this patch tested? This is updating the corner case on the requirement check. Manually check the code. Closes #26123 from praneetsharma/fix-SPARK-27259. Authored-by: prasha2 Signed-off-by: Dongjoon Hyun (cherry picked from commit 57edb4258254fa582f8aae6bfd8bed1069e8155c) Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala b/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala index bfe8152..1beb085 100644 --- a/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala +++ b/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala @@ -76,7 +76,7 @@ private[spark] object InputFileBlockHolder { def set(filePath: String, startOffset: Long, length: Long): Unit = { require(filePath != null, "filePath cannot be null") require(startOffset >= 0, s"startOffset ($startOffset) cannot be negative") -require(length >= 0, s"length ($length) cannot be negative") +require(length >= -1, s"length ($length) cannot be smaller than -1") inputBlock.get().set(new FileBlock(UTF8String.fromString(filePath), startOffset, length)) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-27259][CORE] Allow setting -1 as length for FileBlock
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 57edb42 [SPARK-27259][CORE] Allow setting -1 as length for FileBlock 57edb42 is described below commit 57edb4258254fa582f8aae6bfd8bed1069e8155c Author: prasha2 AuthorDate: Tue Oct 15 22:22:37 2019 -0700 [SPARK-27259][CORE] Allow setting -1 as length for FileBlock ### What changes were proposed in this pull request? This PR aims to update the validation check on `length` from `length >= 0` to `length >= -1` in order to allow set `-1` to keep the default value. ### Why are the changes needed? At Apache Spark 2.2.0, [SPARK-18702](https://github.com/apache/spark/pull/16133/files#diff-2c5519b1cf4308d77d6f12212971544fR27-R38) adds `class FileBlock` with the default `length` value, `-1`, initially. There is no way to set `filePath` only while keeping `length` is `-1`. ```scala def set(filePath: String, startOffset: Long, length: Long): Unit = { require(filePath != null, "filePath cannot be null") require(startOffset >= 0, s"startOffset ($startOffset) cannot be negative") require(length >= 0, s"length ($length) cannot be negative") inputBlock.set(new FileBlock(UTF8String.fromString(filePath), startOffset, length)) } ``` For compressed files (like GZ), the size of split can be set to -1. This was allowed till Spark 2.1 but regressed starting with spark 2.2.x. Please note that split length of -1 also means the length was unknown - a valid scenario. Thus, split length of -1 should be acceptable like pre Spark 2.2. ### Does this PR introduce any user-facing change? No ### How was this patch tested? This is updating the corner case on the requirement check. Manually check the code. Closes #26123 from praneetsharma/fix-SPARK-27259. Authored-by: prasha2 Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala b/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala index bfe8152..1beb085 100644 --- a/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala +++ b/core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala @@ -76,7 +76,7 @@ private[spark] object InputFileBlockHolder { def set(filePath: String, startOffset: Long, length: Long): Unit = { require(filePath != null, "filePath cannot be null") require(startOffset >= 0, s"startOffset ($startOffset) cannot be negative") -require(length >= 0, s"length ($length) cannot be negative") +require(length >= -1, s"length ($length) cannot be smaller than -1") inputBlock.get().set(new FileBlock(UTF8String.fromString(filePath), startOffset, length)) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e00344e -> 93e71e6)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e00344e [SPARK-29423][SS] lazily initialize StreamingQueryManager in SessionState add 93e71e6 [SPARK-29469][SHUFFLE] Avoid retries by RetryingBlockFetcher when ExternalBlockStoreClient is closed No new revisions were added by this update. Summary of changes: .../spark/network/shuffle/ExternalBlockStoreClient.java | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (51f10ed -> e00344e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 51f10ed [SPARK-28560][SQL][FOLLOWUP] code cleanup for local shuffle reader add e00344e [SPARK-29423][SS] lazily initialize StreamingQueryManager in SessionState No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/BaseSessionStateBuilder.scala | 2 +- .../main/scala/org/apache/spark/sql/internal/SessionState.scala | 9 +++-- 2 files changed, 8 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (95de93b -> 51f10ed)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 95de93b [SPARK-24540][SQL] Support for multiple character delimiter in Spark CSV read add 51f10ed [SPARK-28560][SQL][FOLLOWUP] code cleanup for local shuffle reader No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/MapOutputTracker.scala | 34 +++--- .../spark/shuffle/BlockStoreShuffleReader.scala| 22 ++ .../org/apache/spark/shuffle/ShuffleManager.scala | 11 --- .../spark/shuffle/sort/SortShuffleManager.scala| 25 ++-- .../shuffle/BlockStoreShuffleReaderSuite.scala | 8 ++--- .../execution/adaptive/LocalShuffledRowRDD.scala | 14 - 6 files changed, 45 insertions(+), 69 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (95de93b -> 51f10ed)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 95de93b [SPARK-24540][SQL] Support for multiple character delimiter in Spark CSV read add 51f10ed [SPARK-28560][SQL][FOLLOWUP] code cleanup for local shuffle reader No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/MapOutputTracker.scala | 34 +++--- .../spark/shuffle/BlockStoreShuffleReader.scala| 22 ++ .../org/apache/spark/shuffle/ShuffleManager.scala | 11 --- .../spark/shuffle/sort/SortShuffleManager.scala| 25 ++-- .../shuffle/BlockStoreShuffleReaderSuite.scala | 8 ++--- .../execution/adaptive/LocalShuffledRowRDD.scala | 14 - 6 files changed, 45 insertions(+), 69 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (02c5b4f -> 95de93b)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 02c5b4f [SPARK-28947][K8S] Status logging not happens at an interval for liveness add 95de93b [SPARK-24540][SQL] Support for multiple character delimiter in Spark CSV read No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-2.7 | 2 +- dev/deps/spark-deps-hadoop-3.2 | 2 +- pom.xml| 5 +++ python/pyspark/sql/readwriter.py | 6 +-- python/pyspark/sql/streaming.py| 4 +- sql/catalyst/pom.xml | 1 - .../spark/sql/catalyst/csv/CSVExprUtils.scala | 46 ++ .../apache/spark/sql/catalyst/csv/CSVOptions.scala | 2 +- .../spark/sql/catalyst/csv/CSVExprUtilsSuite.scala | 38 ++ sql/core/pom.xml | 1 - .../org/apache/spark/sql/DataFrameReader.scala | 4 +- .../test-data/cars-multichar-delim-crazy.csv | 4 ++ .../resources/test-data/cars-multichar-delim.csv | 4 ++ .../sql/execution/datasources/csv/CSVSuite.scala | 45 + 14 files changed, 152 insertions(+), 12 deletions(-) create mode 100644 sql/core/src/test/resources/test-data/cars-multichar-delim-crazy.csv create mode 100644 sql/core/src/test/resources/test-data/cars-multichar-delim.csv - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0-preview created (now 02c5b4f)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch branch-3.0-preview in repository https://gitbox.apache.org/repos/asf/spark.git. at 02c5b4f [SPARK-28947][K8S] Status logging not happens at an interval for liveness No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-28947][K8S] Status logging not happens at an interval for liveness
This is an automated email from the ASF dual-hosted git repository. vanzin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 02c5b4f [SPARK-28947][K8S] Status logging not happens at an interval for liveness 02c5b4f is described below commit 02c5b4f76337cc3901b8741887292bb4478931f3 Author: Kent Yao AuthorDate: Tue Oct 15 12:34:39 2019 -0700 [SPARK-28947][K8S] Status logging not happens at an interval for liveness ### What changes were proposed in this pull request? This pr invoke the start method of `LoggingPodStatusWatcherImpl` for status logging at intervals. ### Why are the changes needed? This pr invoke the start method of `LoggingPodStatusWatcherImpl` is declared but never called ### Does this PR introduce any user-facing change? no ### How was this patch tested? manually test Closes #25648 from yaooqinn/SPARK-28947. Authored-by: Kent Yao Signed-off-by: Marcelo Vanzin --- .../k8s/submit/KubernetesClientApplication.scala | 25 ++--- .../k8s/submit/LoggingPodStatusWatcher.scala | 61 ++ .../spark/deploy/k8s/submit/ClientSuite.scala | 5 +- 3 files changed, 33 insertions(+), 58 deletions(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala index 11bbad9..8e5532d 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala @@ -86,15 +86,12 @@ private[spark] object ClientArguments { * @param builder Responsible for building the base driver pod based on a composition of *implemented features. * @param kubernetesClient the client to talk to the Kubernetes API server - * @param waitForAppCompletion a flag indicating whether the client should wait for the application - * to complete * @param watcher a watcher that monitors and logs the application status */ private[spark] class Client( conf: KubernetesDriverConf, builder: KubernetesDriverBuilder, kubernetesClient: KubernetesClient, -waitForAppCompletion: Boolean, watcher: LoggingPodStatusWatcher) extends Logging { def run(): Unit = { @@ -124,10 +121,11 @@ private[spark] class Client( .endVolume() .endSpec() .build() +val driverPodName = resolvedDriverPod.getMetadata.getName Utils.tryWithResource( kubernetesClient .pods() -.withName(resolvedDriverPod.getMetadata.getName) +.withName(driverPodName) .watch(watcher)) { _ => val createdDriverPod = kubernetesClient.pods().create(resolvedDriverPod) try { @@ -141,16 +139,8 @@ private[spark] class Client( throw e } - val sId = s"${Option(conf.namespace).map(_ + ":").getOrElse("")}" + -s"${resolvedDriverPod.getMetadata.getName}" - if (waitForAppCompletion) { -logInfo(s"Waiting for application ${conf.appName} with submission ID ${sId} to finish...") -watcher.awaitCompletion() -logInfo(s"Application ${conf.appName} with submission ID ${sId} finished.") - } else { -logInfo(s"Deployed Spark application ${conf.appName} with " + - s"submission ID ${sId} into Kubernetes.") - } + val sId = Seq(conf.namespace, driverPodName).mkString(":") + watcher.watchOrStop(sId) } } @@ -199,13 +189,11 @@ private[spark] class KubernetesClientApplication extends SparkApplication { } private def run(clientArguments: ClientArguments, sparkConf: SparkConf): Unit = { -val appName = sparkConf.getOption("spark.app.name").getOrElse("spark") // For constructing the app ID, we can't use the Spark application name, as the app ID is going // to be added as a label to group resources belonging to the same application. Label values are // considerably restrictive, e.g. must be no longer than 63 characters in length. So we generate // a unique app ID (captured by spark.app.id) in the format below. val kubernetesAppId = s"spark-${UUID.randomUUID().toString.replaceAll("-", "")}" -val waitForAppCompletion = sparkConf.get(WAIT_FOR_APP_COMPLETION) val kubernetesConf = KubernetesConf.createDriverConf( sparkConf, kubernetesAppId, @@ -215,9 +203,7 @@ private[spark] class KubernetesClientApplication extends SparkApplication { // The master URL has been checked for validity already in SparkSubmit. // We just need to get rid of the "k8s://" prefix here.
[spark] branch master updated (4ecbdbb -> 39d53d3)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4ecbdbb [SPARK-29182][CORE] Cache preferred locations of checkpointed RDD add 39d53d3 [SPARK-29470][BUILD] Update plugins to latest versions No new revisions were added by this update. Summary of changes: .../main/resources/org/apache/spark/log4j-defaults.properties| 3 ++- dev/checkstyle.xml | 9 + pom.xml | 8 project/plugins.sbt | 2 +- 4 files changed, 12 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4ecbdbb -> 39d53d3)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4ecbdbb [SPARK-29182][CORE] Cache preferred locations of checkpointed RDD add 39d53d3 [SPARK-29470][BUILD] Update plugins to latest versions No new revisions were added by this update. Summary of changes: .../main/resources/org/apache/spark/log4j-defaults.properties| 3 ++- dev/checkstyle.xml | 9 + pom.xml | 8 project/plugins.sbt | 2 +- 4 files changed, 12 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (322ec0b -> 4ecbdbb)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 322ec0b [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default add 4ecbdbb [SPARK-29182][CORE] Cache preferred locations of checkpointed RDD No new revisions were added by this update. Summary of changes: .../org/apache/spark/internal/config/package.scala | 11 +++ .../apache/spark/rdd/ReliableCheckpointRDD.scala | 36 +++--- .../scala/org/apache/spark/CheckpointSuite.scala | 26 +++- 3 files changed, 67 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2e28622 -> 322ec0b)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2e28622 [SPARK-28211][CORE][SHUFFLE] Propose Shuffle Driver Components API add 322ec0b [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 1 + .../spark/sql/catalyst/analysis/Analyzer.scala | 18 .../catalyst/analysis/TableOutputResolver.scala| 11 -- .../spark/sql/catalyst/expressions/Cast.scala | 1 + .../org/apache/spark/sql/internal/SQLConf.scala| 6 +++--- .../org/apache/spark/sql/types/DataType.scala | 2 ++ .../types/DataTypeWriteCompatibilitySuite.scala| 24 ++ .../spark/sql/execution/datasources/rules.scala| 10 ++--- .../org/apache/spark/sql/SQLQueryTestSuite.scala | 14 - .../spark/sql/execution/command/DDLSuite.scala | 4 ++-- .../execution/datasources/orc/OrcSourceSuite.scala | 4 +++- .../datasources/parquet/ParquetQuerySuite.scala| 16 +++ .../org/apache/spark/sql/sources/InsertSuite.scala | 5 - .../thriftserver/ThriftServerQueryTestSuite.scala | 11 ++ .../hive/execution/HiveCompatibilitySuite.scala| 3 +++ .../spark/sql/hive/client/VersionsSuite.scala | 2 +- .../spark/sql/hive/execution/HiveDDLSuite.scala| 4 ++-- .../hive/execution/HiveSerDeReadWriteSuite.scala | 8 +--- .../spark/sql/hive/orc/HiveOrcQuerySuite.scala | 5 - 19 files changed, 92 insertions(+), 57 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2e28622 -> 322ec0b)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2e28622 [SPARK-28211][CORE][SHUFFLE] Propose Shuffle Driver Components API add 322ec0b [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 1 + .../spark/sql/catalyst/analysis/Analyzer.scala | 18 .../catalyst/analysis/TableOutputResolver.scala| 11 -- .../spark/sql/catalyst/expressions/Cast.scala | 1 + .../org/apache/spark/sql/internal/SQLConf.scala| 6 +++--- .../org/apache/spark/sql/types/DataType.scala | 2 ++ .../types/DataTypeWriteCompatibilitySuite.scala| 24 ++ .../spark/sql/execution/datasources/rules.scala| 10 ++--- .../org/apache/spark/sql/SQLQueryTestSuite.scala | 14 - .../spark/sql/execution/command/DDLSuite.scala | 4 ++-- .../execution/datasources/orc/OrcSourceSuite.scala | 4 +++- .../datasources/parquet/ParquetQuerySuite.scala| 16 +++ .../org/apache/spark/sql/sources/InsertSuite.scala | 5 - .../thriftserver/ThriftServerQueryTestSuite.scala | 11 ++ .../hive/execution/HiveCompatibilitySuite.scala| 3 +++ .../spark/sql/hive/client/VersionsSuite.scala | 2 +- .../spark/sql/hive/execution/HiveDDLSuite.scala| 4 ++-- .../hive/execution/HiveSerDeReadWriteSuite.scala | 8 +--- .../spark/sql/hive/orc/HiveOrcQuerySuite.scala | 5 - 19 files changed, 92 insertions(+), 57 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-28211][CORE][SHUFFLE] Propose Shuffle Driver Components API
This is an automated email from the ASF dual-hosted git repository. irashid pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2e28622 [SPARK-28211][CORE][SHUFFLE] Propose Shuffle Driver Components API 2e28622 is described below commit 2e28622d8aeb9ce2460e803bb7d994196bcc0253 Author: Yifei Huang AuthorDate: Tue Oct 15 12:26:49 2019 -0500 [SPARK-28211][CORE][SHUFFLE] Propose Shuffle Driver Components API ### What changes were proposed in this pull request? This is the next step of the Spark-25299 work of proposing a new Shuffle storage API. This patch includes the components of the plugin that hook into the driver, including driver shuffle initialization, application cleanup, and shuffle cleanup. ### How was this patch tested? Existing unit tests, plus an additional test for testing the interactions between the driver and executor initialization. Closes #25823 from yifeih/yh/upstream/driver-lifecycle. Lead-authored-by: Yifei Huang Co-authored-by: mccheah Signed-off-by: Imran Rashid --- .../apache/spark/shuffle/api/ShuffleDataIO.java| 6 ++ .../spark/shuffle/api/ShuffleDriverComponents.java | 64 +++ .../shuffle/api/ShuffleExecutorComponents.java | 12 ++- .../shuffle/sort/io/LocalDiskShuffleDataIO.java| 8 +- java => LocalDiskShuffleDriverComponents.java} | 35 +--- .../io/LocalDiskShuffleExecutorComponents.java | 7 +- .../scala/org/apache/spark/ContextCleaner.scala| 8 +- .../main/scala/org/apache/spark/Dependency.scala | 1 + .../main/scala/org/apache/spark/SparkContext.scala | 17 +++- .../apache/spark/shuffle/ShuffleDataIOUtils.scala | 42 ++ .../spark/shuffle/sort/SortShuffleManager.scala| 15 ++-- .../apache/spark/InternalAccumulatorSuite.scala| 3 +- .../shuffle/ShuffleDriverComponentsSuite.scala | 94 ++ 13 files changed, 281 insertions(+), 31 deletions(-) diff --git a/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDataIO.java b/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDataIO.java index e9e50ec..e4554bd 100644 --- a/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDataIO.java +++ b/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDataIO.java @@ -46,4 +46,10 @@ public interface ShuffleDataIO { * are only invoked on the executors. */ ShuffleExecutorComponents executor(); + + /** + * Called once on driver process to bootstrap the shuffle metadata modules that + * are maintained by the driver. + */ + ShuffleDriverComponents driver(); } diff --git a/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDriverComponents.java b/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDriverComponents.java new file mode 100644 index 000..b4cec17 --- /dev/null +++ b/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDriverComponents.java @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle.api; + +import java.util.Map; + +import org.apache.spark.annotation.Private; + +/** + * :: Private :: + * An interface for building shuffle support modules for the Driver. + */ +@Private +public interface ShuffleDriverComponents { + + /** + * Called once in the driver to bootstrap this module that is specific to this application. + * This method is called before submitting executor requests to the cluster manager. + * + * This method should prepare the module with its shuffle components i.e. registering against + * an external file servers or shuffle services, or creating tables in a shuffle + * storage data database. + * + * @return additional SparkConf settings necessary for initializing the executor components. + * This would include configurations that cannot be statically set on the application, like + * the host:port of external services for shuffle storage. + */ + Map initializeApplication(); + + /** + * Called once at the end of the Spark application to clean up any existing shuffle state. + */ + void cleanupApplication(); + + /
[spark] branch master updated (8915966 -> 9ac4b2d)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8915966 [SPARK-29473][SQL] move statement logical plans to a new file add 9ac4b2d [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/MapOutputTracker.scala | 91 +- .../spark/shuffle/BlockStoreShuffleReader.scala| 19 ++- .../org/apache/spark/shuffle/ShuffleManager.scala | 13 ++ .../spark/shuffle/sort/SortShuffleManager.scala| 21 .../org/apache/spark/sql/internal/SQLConf.scala| 8 ++ .../apache/spark/sql/execution/SparkPlanInfo.scala | 3 +- .../execution/adaptive/AdaptiveSparkPlanExec.scala | 1 + .../adaptive/AdaptiveSparkPlanHelper.scala | 1 + .../execution/adaptive/LocalShuffledRowRDD.scala | 98 +++ .../adaptive/OptimizeLocalShuffleReader.scala | 132 + .../execution/exchange/ShuffleExchangeExec.scala | 5 + .../adaptive/AdaptiveQueryExecSuite.scala | 46 +-- 12 files changed, 424 insertions(+), 14 deletions(-) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/LocalShuffledRowRDD.scala create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeLocalShuffleReader.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a988aaf -> 8915966)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a988aaf [SPARK-29454][SQL] Reduce unsafeProjection times when read Parquet file use non-vectorized mode add 8915966 [SPARK-29473][SQL] move statement logical plans to a new file No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 1 - .../sql/catalyst/analysis/CheckAnalysis.scala | 1 - .../sql/catalyst/analysis/ResolveCatalogs.scala| 1 - .../apache/spark/sql/catalyst/dsl/package.scala| 1 - .../spark/sql/catalyst/parser/AstBuilder.scala | 1 - .../plans/logical/sql/AlterTableStatements.scala | 78 -- .../plans/logical/sql/AlterViewStatements.scala| 33 --- .../plans/logical/sql/CreateTableStatement.scala | 58 .../plans/logical/sql/DeleteFromStatement.scala| 27 -- .../logical/sql/DescribeColumnStatement.scala | 23 -- .../plans/logical/sql/DescribeTableStatement.scala | 25 -- .../plans/logical/sql/DropTableStatement.scala | 34 --- .../plans/logical/sql/DropViewStatement.scala | 33 --- .../plans/logical/sql/InsertIntoStatement.scala| 50 .../plans/logical/sql/ParsedStatement.scala| 49 .../plans/logical/sql/ReplaceTableStatement.scala | 60 - .../logical/sql/ShowNamespacesStatement.scala | 24 -- .../plans/logical/sql/ShowTablesStatement.scala| 24 -- .../plans/logical/sql/UpdateTableStatement.scala | 27 -- .../catalyst/plans/logical/sql/UseStatement.scala | 23 -- .../sql/catalyst/plans/logical/statements.scala| 294 + .../spark/sql/catalyst/parser/DDLParserSuite.scala | 3 +- .../sql/catalyst/parser/PlanParserSuite.scala | 1 - .../org/apache/spark/sql/DataFrameWriter.scala | 3 +- .../catalyst/analysis/ResolveSessionCatalog.scala | 5 +- .../execution/datasources/DataSourceStrategy.scala | 3 +- .../datasources/FallBackFileSourceV2.scala | 3 +- .../spark/sql/execution/datasources/rules.scala| 1 - .../org/apache/spark/sql/SQLQueryTestSuite.scala | 1 - .../spark/sql/execution/SparkSqlParserSuite.scala | 3 +- .../spark/sql/util/DataFrameCallbackSuite.scala| 3 +- .../org/apache/spark/sql/hive/HiveStrategies.scala | 3 +- .../sql/hive/execution/HiveComparisonTest.scala| 1 - 33 files changed, 303 insertions(+), 594 deletions(-) delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/AlterTableStatements.scala delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/AlterViewStatements.scala delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/CreateTableStatement.scala delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/DeleteFromStatement.scala delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/DescribeColumnStatement.scala delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/DescribeTableStatement.scala delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/DropTableStatement.scala delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/DropViewStatement.scala delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/InsertIntoStatement.scala delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ParsedStatement.scala delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ReplaceTableStatement.scala delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ShowNamespacesStatement.scala delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ShowTablesStatement.scala delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/UpdateTableStatement.scala delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/UseStatement.scala create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org