[spark] branch master updated (20e8843 -> d1371a2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 20e8843 [MINOR][SQL] Skip warning if JOB_SUMMARY_LEVEL is set to NONE add d1371a2 [SPARK-27964][SQL] Move v2 catalog update methods to CatalogV2Util No new revisions were added by this update. Summary of changes: project/SparkBuild.scala | 1 + .../sql/catalog/v2/utils/CatalogV2Util.scala} | 102 +++-- .../spark/sql/catalog/v2/TestTableCatalog.scala| 125 + .../sql/sources/v2/TestInMemoryTableCatalog.scala | 7 +- 4 files changed, 26 insertions(+), 209 deletions(-) copy sql/catalyst/src/{test/scala/org/apache/spark/sql/catalog/v2/TestTableCatalog.scala => main/java/org/apache/spark/sql/catalog/v2/utils/CatalogV2Util.scala} (62%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [MINOR][SQL] Skip warning if JOB_SUMMARY_LEVEL is set to NONE
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 1a86eb3 [MINOR][SQL] Skip warning if JOB_SUMMARY_LEVEL is set to NONE 1a86eb3 is described below commit 1a86eb3c2484e61020f2c705973540add7d16053 Author: Jordan Sanders AuthorDate: Wed Jun 5 14:57:36 2019 -0700 [MINOR][SQL] Skip warning if JOB_SUMMARY_LEVEL is set to NONE ## What changes were proposed in this pull request? I believe the log message: `Committer $committerClass is not a ParquetOutputCommitter and cannot create job summaries. Set Parquet option ${ParquetOutputFormat.JOB_SUMMARY_LEVEL} to NONE.` is at odds with the `if` statement that logs the warning. Despite the instructions in the warning, users still encounter the warning if `JOB_SUMMARY_LEVEL` is already set to `NONE`. This pull request introduces a change to skip logging the warning if `JOB_SUMMARY_LEVEL` is set to `NONE`. ## How was this patch tested? I built to make sure everything still compiled and I ran the existing test suite. I didn't feel it was worth the overhead to add a test to make sure a log message does not get logged, but if reviewers feel differently, I can add one. Closes #24808 from jmsanders/master. Authored-by: Jordan Sanders Signed-off-by: Dongjoon Hyun --- .../spark/sql/execution/datasources/parquet/ParquetFileFormat.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala index ea4f159..16cd570 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala @@ -129,7 +129,7 @@ class ParquetFileFormat conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE) } -if (ParquetOutputFormat.getJobSummaryLevel(conf) == JobSummaryLevel.NONE +if (ParquetOutputFormat.getJobSummaryLevel(conf) != JobSummaryLevel.NONE && !classOf[ParquetOutputCommitter].isAssignableFrom(committerClass)) { // output summary is requested, but the class is not a Parquet Committer logWarning(s"Committer $committerClass is not a ParquetOutputCommitter and cannot" + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5d6758c -> 20e8843)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5d6758c [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst add 20e8843 [MINOR][SQL] Skip warning if JOB_SUMMARY_LEVEL is set to NONE No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/parquet/ParquetFileFormat.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6c28ef1 -> 5d6758c)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6c28ef1 [SPARK-27933][SS] Extracting common purge behaviour to the parent StreamExecution add 5d6758c [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/parser/SqlBase.g4| 40 +++- .../spark/sql/catalyst/parser/AstBuilder.scala | 162 +++- .../plans/logical/sql/AlterTableStatements.scala | 78 ...ewStatement.scala => AlterViewStatements.scala} | 20 +- .../plans/logical/sql/CreateTableStatement.scala | 10 +- .../plans/logical/sql/ParsedStatement.scala| 5 + .../spark/sql/catalyst/parser/DDLParserSuite.scala | 207 - .../spark/sql/execution/SparkSqlParser.scala | 60 +- .../datasources/DataSourceResolution.scala | 42 - .../sql/execution/command/DDLParserSuite.scala | 63 +-- .../execution/command/PlanResolutionSuite.scala| 76 11 files changed, 612 insertions(+), 151 deletions(-) create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/AlterTableStatements.scala copy sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/{DropViewStatement.scala => AlterViewStatements.scala} (69%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-27933][SS] Extracting common purge behaviour to the parent StreamExecution
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6c28ef1 [SPARK-27933][SS] Extracting common purge behaviour to the parent StreamExecution 6c28ef1 is described below commit 6c28ef144d6c8c30ed986a90a9458e976ba08917 Author: Jacek Laskowski AuthorDate: Wed Jun 5 12:39:31 2019 -0500 [SPARK-27933][SS] Extracting common purge behaviour to the parent StreamExecution Extracting the common purge "behaviour" to the parent StreamExecution. ## How was this patch tested? No added behaviour so relying on existing tests. Closes #24781 from jaceklaskowski/StreamExecution-purge. Authored-by: Jacek Laskowski Signed-off-by: Sean Owen --- .../apache/spark/sql/execution/streaming/MicroBatchExecution.scala | 3 +-- .../org/apache/spark/sql/execution/streaming/StreamExecution.scala | 6 ++ .../sql/execution/streaming/continuous/ContinuousExecution.scala| 3 +-- 3 files changed, 8 insertions(+), 4 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala index f0bf7e7..fd2638f 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala @@ -426,8 +426,7 @@ class MicroBatchExecution( // It is now safe to discard the metadata beyond the minimum number to retain. // Note that purge is exclusive, i.e. it purges everything before the target ID. if (minLogEntriesToMaintain < currentBatchId) { - offsetLog.purge(currentBatchId - minLogEntriesToMaintain) - commitLog.purge(currentBatchId - minLogEntriesToMaintain) + purge(currentBatchId - minLogEntriesToMaintain) } } noNewData = false diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala index 4c08b3a..7c1f6ca 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala @@ -610,6 +610,12 @@ abstract class StreamExecution( } } } + + protected def purge(threshold: Long): Unit = { +logDebug(s"Purging metadata at threshold=$threshold") +offsetLog.purge(threshold) +commitLog.purge(threshold) + } } object StreamExecution { diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala index 82708a3..509b103 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala @@ -352,8 +352,7 @@ class ContinuousExecution( // number of batches that must be retained and made recoverable, so we should keep the // specified number of metadata that have been committed. if (minLogEntriesToMaintain <= epoch) { - offsetLog.purge(epoch + 1 - minLogEntriesToMaintain) - commitLog.purge(epoch + 1 - minLogEntriesToMaintain) + purge(epoch + 1 - minLogEntriesToMaintain) } awaitProgressLock.lock() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3f102a8 -> 8b6232b)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3f102a8 [SPARK-27749][SQL] hadoop-3.2 support hive-thriftserver add 8b6232b [SPARK-27521][SQL] Move data source v2 to catalyst module No new revisions were added by this update. Summary of changes: project/MimaExcludes.scala | 39 ++ sql/catalyst/pom.xml | 4 +++ .../spark/sql/sources/v2/SessionConfigSupport.java | 0 .../apache/spark/sql/sources/v2/SupportsRead.java | 0 .../apache/spark/sql/sources/v2/SupportsWrite.java | 0 .../apache/spark/sql/sources/v2/TableProvider.java | 9 + .../apache/spark/sql/sources/v2/reader/Batch.java | 0 .../sql/sources/v2/reader/InputPartition.java | 0 .../sql/sources/v2/reader/PartitionReader.java | 0 .../sources/v2/reader/PartitionReaderFactory.java | 0 .../apache/spark/sql/sources/v2/reader/Scan.java | 0 .../spark/sql/sources/v2/reader/ScanBuilder.java | 0 .../spark/sql/sources/v2/reader/Statistics.java| 0 .../sources/v2/reader/SupportsPushDownFilters.java | 0 .../v2/reader/SupportsPushDownRequiredColumns.java | 0 .../v2/reader/SupportsReportPartitioning.java | 0 .../v2/reader/SupportsReportStatistics.java| 0 .../reader/partitioning/ClusteredDistribution.java | 0 .../v2/reader/partitioning/Distribution.java | 0 .../v2/reader/partitioning/Partitioning.java | 0 .../streaming/ContinuousPartitionReader.java | 0 .../ContinuousPartitionReaderFactory.java | 0 .../v2/reader/streaming/ContinuousStream.java | 0 .../v2/reader/streaming/MicroBatchStream.java | 0 .../sql/sources/v2/reader/streaming/Offset.java| 0 .../v2/reader/streaming/PartitionOffset.java | 0 .../v2/reader/streaming/SparkDataStream.java | 0 .../spark/sql/sources/v2/writer/BatchWrite.java| 0 .../spark/sql/sources/v2/writer/DataWriter.java| 0 .../sql/sources/v2/writer/DataWriterFactory.java | 0 .../v2/writer/SupportsDynamicOverwrite.java| 0 .../sql/sources/v2/writer/SupportsOverwrite.java | 0 .../sql/sources/v2/writer/SupportsTruncate.java| 0 .../spark/sql/sources/v2/writer/WriteBuilder.java | 0 .../sql/sources/v2/writer/WriterCommitMessage.java | 0 .../streaming/StreamingDataWriterFactory.java | 0 .../v2/writer/streaming/StreamingWrite.java| 0 .../spark/sql/vectorized/ArrowColumnVector.java| 2 +- .../apache/spark/sql/vectorized/ColumnVector.java | 0 .../apache/spark/sql/vectorized/ColumnarArray.java | 0 .../apache/spark/sql/vectorized/ColumnarBatch.java | 0 .../apache/spark/sql/vectorized/ColumnarMap.java | 0 .../apache/spark/sql/vectorized/ColumnarRow.java | 0 .../org/apache/spark/sql/sources/filters.scala | 0 .../org/apache/spark/sql/util}/ArrowUtils.scala| 2 +- .../apache/spark/sql/util}/ArrowUtilsSuite.scala | 4 +-- sql/core/pom.xml | 4 --- .../sql/execution/arrow/ArrowConverters.scala | 1 + .../spark/sql/execution/arrow/ArrowWriter.scala| 1 + .../execution/python/AggregateInPandasExec.scala | 2 +- .../sql/execution/python/ArrowEvalPythonExec.scala | 2 +- .../sql/execution/python/ArrowPythonRunner.scala | 3 +- .../python/FlatMapGroupsInPandasExec.scala | 4 +-- .../sql/execution/python/WindowInPandasExec.scala | 2 +- .../spark/sql/execution/r/ArrowRRunner.scala | 3 +- .../sql/execution/arrow/ArrowConvertersSuite.scala | 1 + .../sources/RateStreamProviderSuite.scala | 2 +- .../streaming/sources/TextSocketStreamSuite.scala | 2 +- .../vectorized/ArrowColumnVectorSuite.scala| 2 +- .../execution/vectorized/ColumnarBatchSuite.scala | 4 +-- 60 files changed, 65 insertions(+), 28 deletions(-) rename sql/{core => catalyst}/src/main/java/org/apache/spark/sql/sources/v2/SessionConfigSupport.java (100%) rename sql/{core => catalyst}/src/main/java/org/apache/spark/sql/sources/v2/SupportsRead.java (100%) rename sql/{core => catalyst}/src/main/java/org/apache/spark/sql/sources/v2/SupportsWrite.java (100%) rename sql/{core => catalyst}/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java (89%) rename sql/{core => catalyst}/src/main/java/org/apache/spark/sql/sources/v2/reader/Batch.java (100%) rename sql/{core => catalyst}/src/main/java/org/apache/spark/sql/sources/v2/reader/InputPartition.java (100%) rename sql/{core => catalyst}/src/main/java/org/apache/spark/sql/sources/v2/reader/PartitionReader.java (100%) rename sql/{core => catalyst}/src/main/java/org/apache/spark/sql/sources/v2/reader/PartitionReaderFactory.java (100%) rename sql/{core => catalyst}/src/main/java/org/apache/spark/sql/sources/v2/reader/Scan.java (100%) rename sql/{core => catalyst}/src/main/java/org/apache/spark/sql/sources/v2
[spark] branch master updated (b312033 -> 18834e8)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b312033 [SPARK-20286][CORE] Improve logic for timing out executors in dynamic allocation. add 18834e8 [SPARK-27899][SQL] Refactor getTableOption() to extract a common method No new revisions were added by this update. Summary of changes: .../spark/sql/hive/client/HiveClientImpl.scala | 206 +++-- 1 file changed, 104 insertions(+), 102 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-20286][CORE] Improve logic for timing out executors in dynamic allocation.
This is an automated email from the ASF dual-hosted git repository. irashid pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b312033 [SPARK-20286][CORE] Improve logic for timing out executors in dynamic allocation. b312033 is described below commit b312033bd33cd6cbf1f166ccaa7a5df4e3421078 Author: Marcelo Vanzin AuthorDate: Wed Jun 5 08:09:44 2019 -0500 [SPARK-20286][CORE] Improve logic for timing out executors in dynamic allocation. This change refactors the portions of the ExecutorAllocationManager class that track executor state into a new class, to achieve a few goals: - make the code easier to understand - better separate concerns (task backlog vs. executor state) - less synchronization between event and allocation threads - less coupling between the allocation code and executor state tracking The executor tracking code was moved to a new class (ExecutorMonitor) that encapsulates all the logic of tracking what happens to executors and when they can be timed out. The logic to actually remove the executors remains in the EAM, since it still requires information that is not tracked by the new executor monitor code. In the executor monitor itself, of interest, specifically, is a change in how cached blocks are tracked; instead of polling the block manager, the monitor now uses events to track which executors have cached blocks, and is able to detect also unpersist events and adjust the time when the executor should be removed accordingly. (That's the bug mentioned in the PR title.) Because of the refactoring, a few tests in the old EAM test suite were removed, since they're now covered by the newly added test suite. The EAM suite was also changed a little bit to not instantiate a SparkContext every time. This allowed some cleanup, and the tests also run faster. Tested with new and updated unit tests, and with multiple TPC-DS workloads running with dynamic allocation on; also some manual tests for the caching behavior. Closes #24704 from vanzin/SPARK-20286. Authored-by: Marcelo Vanzin Signed-off-by: Imran Rashid --- .../apache/spark/ExecutorAllocationClient.scala| 9 +- .../apache/spark/ExecutorAllocationManager.scala | 239 +-- .../main/scala/org/apache/spark/SparkContext.scala | 3 +- .../org/apache/spark/internal/config/package.scala | 8 +- .../cluster/CoarseGrainedSchedulerBackend.scala| 4 + .../spark/scheduler/dynalloc/ExecutorMonitor.scala | 283 .../apache/spark/storage/BlockManagerMaster.scala | 9 - .../spark/storage/BlockManagerMasterEndpoint.scala | 31 - .../spark/storage/BlockManagerMessages.scala | 3 - ...g.apache.spark.scheduler.ExternalClusterManager | 1 - .../spark/ExecutorAllocationManagerSuite.scala | 708 ++--- .../deploy/ExternalShuffleServiceDbSuite.scala | 9 +- .../scheduler/dynalloc/ExecutorMonitorSuite.scala | 289 + .../spark/storage/BlockManagerInfoSuite.scala | 14 - .../scheduler/ExecutorAllocationManagerSuite.scala | 2 +- 15 files changed, 822 insertions(+), 790 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala b/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala index 63d87b4..cb965cb 100644 --- a/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala +++ b/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala @@ -23,11 +23,18 @@ package org.apache.spark */ private[spark] trait ExecutorAllocationClient { - /** Get the list of currently active executors */ private[spark] def getExecutorIds(): Seq[String] /** + * Whether an executor is active. An executor is active when it can be used to execute tasks + * for jobs submitted by the application. + * + * @return whether the executor with the given ID is currently active. + */ + def isExecutorActive(id: String): Boolean + + /** * Update the cluster manager on our scheduling needs. Three bits of information are included * to help it make decisions. * @param numExecutors The total number of executors we'd like to have. The cluster manager diff --git a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala index 1782027..63df7cc 100644 --- a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala +++ b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala @@ -30,7 +30,7 @@ import org.apache.spark.internal.config._ import org.apache.spark.internal.config.Tests.TEST_SCHEDULE_INTERVAL import org.apache.spark.metrics.source.Source import org.apache.spark.scheduler._ -import org.apache.spark.storage.BlockManagerMaster +i