[spark] branch master updated (20e8843 -> d1371a2)

2019-06-05 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 20e8843  [MINOR][SQL] Skip warning if JOB_SUMMARY_LEVEL is set to NONE
 add d1371a2  [SPARK-27964][SQL] Move v2 catalog update methods to 
CatalogV2Util

No new revisions were added by this update.

Summary of changes:
 project/SparkBuild.scala   |   1 +
 .../sql/catalog/v2/utils/CatalogV2Util.scala}  | 102 +++--
 .../spark/sql/catalog/v2/TestTableCatalog.scala| 125 +
 .../sql/sources/v2/TestInMemoryTableCatalog.scala  |   7 +-
 4 files changed, 26 insertions(+), 209 deletions(-)
 copy 
sql/catalyst/src/{test/scala/org/apache/spark/sql/catalog/v2/TestTableCatalog.scala
 => main/java/org/apache/spark/sql/catalog/v2/utils/CatalogV2Util.scala} (62%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [MINOR][SQL] Skip warning if JOB_SUMMARY_LEVEL is set to NONE

2019-06-05 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 1a86eb3  [MINOR][SQL] Skip warning if JOB_SUMMARY_LEVEL is set to NONE
1a86eb3 is described below

commit 1a86eb3c2484e61020f2c705973540add7d16053
Author: Jordan Sanders 
AuthorDate: Wed Jun 5 14:57:36 2019 -0700

[MINOR][SQL] Skip warning if JOB_SUMMARY_LEVEL is set to NONE

## What changes were proposed in this pull request?

I believe the log message: `Committer $committerClass is not a 
ParquetOutputCommitter and cannot create job summaries. Set Parquet option 
${ParquetOutputFormat.JOB_SUMMARY_LEVEL} to NONE.` is at odds with the `if` 
statement that logs the warning. Despite the instructions in the warning, users 
still encounter the warning if `JOB_SUMMARY_LEVEL` is already set to `NONE`.

This pull request introduces a change to skip logging the warning if 
`JOB_SUMMARY_LEVEL` is set to `NONE`.

## How was this patch tested?

I built to make sure everything still compiled and I ran the existing test 
suite. I didn't feel it was worth the overhead to add a test to make sure a log 
message does not get logged, but if reviewers feel differently, I can add one.

Closes #24808 from jmsanders/master.

Authored-by: Jordan Sanders 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/execution/datasources/parquet/ParquetFileFormat.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
index ea4f159..16cd570 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
@@ -129,7 +129,7 @@ class ParquetFileFormat
   conf.setEnum(ParquetOutputFormat.JOB_SUMMARY_LEVEL, JobSummaryLevel.NONE)
 }
 
-if (ParquetOutputFormat.getJobSummaryLevel(conf) == JobSummaryLevel.NONE
+if (ParquetOutputFormat.getJobSummaryLevel(conf) != JobSummaryLevel.NONE
   && !classOf[ParquetOutputCommitter].isAssignableFrom(committerClass)) {
   // output summary is requested, but the class is not a Parquet Committer
   logWarning(s"Committer $committerClass is not a ParquetOutputCommitter 
and cannot" +


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (5d6758c -> 20e8843)

2019-06-05 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5d6758c  [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst
 add 20e8843  [MINOR][SQL] Skip warning if JOB_SUMMARY_LEVEL is set to NONE

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/parquet/ParquetFileFormat.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (6c28ef1 -> 5d6758c)

2019-06-05 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6c28ef1  [SPARK-27933][SS] Extracting common purge behaviour to the 
parent StreamExecution
 add 5d6758c  [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|  40 +++-
 .../spark/sql/catalyst/parser/AstBuilder.scala | 162 +++-
 .../plans/logical/sql/AlterTableStatements.scala   |  78 
 ...ewStatement.scala => AlterViewStatements.scala} |  20 +-
 .../plans/logical/sql/CreateTableStatement.scala   |  10 +-
 .../plans/logical/sql/ParsedStatement.scala|   5 +
 .../spark/sql/catalyst/parser/DDLParserSuite.scala | 207 -
 .../spark/sql/execution/SparkSqlParser.scala   |  60 +-
 .../datasources/DataSourceResolution.scala |  42 -
 .../sql/execution/command/DDLParserSuite.scala |  63 +--
 .../execution/command/PlanResolutionSuite.scala|  76 
 11 files changed, 612 insertions(+), 151 deletions(-)
 create mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/AlterTableStatements.scala
 copy 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/{DropViewStatement.scala
 => AlterViewStatements.scala} (69%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-27933][SS] Extracting common purge behaviour to the parent StreamExecution

2019-06-05 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6c28ef1  [SPARK-27933][SS] Extracting common purge behaviour to the 
parent StreamExecution
6c28ef1 is described below

commit 6c28ef144d6c8c30ed986a90a9458e976ba08917
Author: Jacek Laskowski 
AuthorDate: Wed Jun 5 12:39:31 2019 -0500

[SPARK-27933][SS] Extracting common purge behaviour to the parent 
StreamExecution

Extracting the common purge "behaviour" to the parent StreamExecution.

## How was this patch tested?

No added behaviour so relying on existing tests.

Closes #24781 from jaceklaskowski/StreamExecution-purge.

Authored-by: Jacek Laskowski 
Signed-off-by: Sean Owen 
---
 .../apache/spark/sql/execution/streaming/MicroBatchExecution.scala  | 3 +--
 .../org/apache/spark/sql/execution/streaming/StreamExecution.scala  | 6 ++
 .../sql/execution/streaming/continuous/ContinuousExecution.scala| 3 +--
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
index f0bf7e7..fd2638f 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
@@ -426,8 +426,7 @@ class MicroBatchExecution(
 // It is now safe to discard the metadata beyond the minimum number to 
retain.
 // Note that purge is exclusive, i.e. it purges everything before the 
target ID.
 if (minLogEntriesToMaintain < currentBatchId) {
-  offsetLog.purge(currentBatchId - minLogEntriesToMaintain)
-  commitLog.purge(currentBatchId - minLogEntriesToMaintain)
+  purge(currentBatchId - minLogEntriesToMaintain)
 }
   }
   noNewData = false
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
index 4c08b3a..7c1f6ca 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
@@ -610,6 +610,12 @@ abstract class StreamExecution(
 }
 }
   }
+
+  protected def purge(threshold: Long): Unit = {
+logDebug(s"Purging metadata at threshold=$threshold")
+offsetLog.purge(threshold)
+commitLog.purge(threshold)
+  }
 }
 
 object StreamExecution {
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
index 82708a3..509b103 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
@@ -352,8 +352,7 @@ class ContinuousExecution(
 // number of batches that must be retained and made recoverable, so we 
should keep the
 // specified number of metadata that have been committed.
 if (minLogEntriesToMaintain <= epoch) {
-  offsetLog.purge(epoch + 1 - minLogEntriesToMaintain)
-  commitLog.purge(epoch + 1 - minLogEntriesToMaintain)
+  purge(epoch + 1 - minLogEntriesToMaintain)
 }
 
 awaitProgressLock.lock()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (3f102a8 -> 8b6232b)

2019-06-05 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3f102a8  [SPARK-27749][SQL] hadoop-3.2 support hive-thriftserver
 add 8b6232b  [SPARK-27521][SQL] Move data source v2 to catalyst module

No new revisions were added by this update.

Summary of changes:
 project/MimaExcludes.scala | 39 ++
 sql/catalyst/pom.xml   |  4 +++
 .../spark/sql/sources/v2/SessionConfigSupport.java |  0
 .../apache/spark/sql/sources/v2/SupportsRead.java  |  0
 .../apache/spark/sql/sources/v2/SupportsWrite.java |  0
 .../apache/spark/sql/sources/v2/TableProvider.java |  9 +
 .../apache/spark/sql/sources/v2/reader/Batch.java  |  0
 .../sql/sources/v2/reader/InputPartition.java  |  0
 .../sql/sources/v2/reader/PartitionReader.java |  0
 .../sources/v2/reader/PartitionReaderFactory.java  |  0
 .../apache/spark/sql/sources/v2/reader/Scan.java   |  0
 .../spark/sql/sources/v2/reader/ScanBuilder.java   |  0
 .../spark/sql/sources/v2/reader/Statistics.java|  0
 .../sources/v2/reader/SupportsPushDownFilters.java |  0
 .../v2/reader/SupportsPushDownRequiredColumns.java |  0
 .../v2/reader/SupportsReportPartitioning.java  |  0
 .../v2/reader/SupportsReportStatistics.java|  0
 .../reader/partitioning/ClusteredDistribution.java |  0
 .../v2/reader/partitioning/Distribution.java   |  0
 .../v2/reader/partitioning/Partitioning.java   |  0
 .../streaming/ContinuousPartitionReader.java   |  0
 .../ContinuousPartitionReaderFactory.java  |  0
 .../v2/reader/streaming/ContinuousStream.java  |  0
 .../v2/reader/streaming/MicroBatchStream.java  |  0
 .../sql/sources/v2/reader/streaming/Offset.java|  0
 .../v2/reader/streaming/PartitionOffset.java   |  0
 .../v2/reader/streaming/SparkDataStream.java   |  0
 .../spark/sql/sources/v2/writer/BatchWrite.java|  0
 .../spark/sql/sources/v2/writer/DataWriter.java|  0
 .../sql/sources/v2/writer/DataWriterFactory.java   |  0
 .../v2/writer/SupportsDynamicOverwrite.java|  0
 .../sql/sources/v2/writer/SupportsOverwrite.java   |  0
 .../sql/sources/v2/writer/SupportsTruncate.java|  0
 .../spark/sql/sources/v2/writer/WriteBuilder.java  |  0
 .../sql/sources/v2/writer/WriterCommitMessage.java |  0
 .../streaming/StreamingDataWriterFactory.java  |  0
 .../v2/writer/streaming/StreamingWrite.java|  0
 .../spark/sql/vectorized/ArrowColumnVector.java|  2 +-
 .../apache/spark/sql/vectorized/ColumnVector.java  |  0
 .../apache/spark/sql/vectorized/ColumnarArray.java |  0
 .../apache/spark/sql/vectorized/ColumnarBatch.java |  0
 .../apache/spark/sql/vectorized/ColumnarMap.java   |  0
 .../apache/spark/sql/vectorized/ColumnarRow.java   |  0
 .../org/apache/spark/sql/sources/filters.scala |  0
 .../org/apache/spark/sql/util}/ArrowUtils.scala|  2 +-
 .../apache/spark/sql/util}/ArrowUtilsSuite.scala   |  4 +--
 sql/core/pom.xml   |  4 ---
 .../sql/execution/arrow/ArrowConverters.scala  |  1 +
 .../spark/sql/execution/arrow/ArrowWriter.scala|  1 +
 .../execution/python/AggregateInPandasExec.scala   |  2 +-
 .../sql/execution/python/ArrowEvalPythonExec.scala |  2 +-
 .../sql/execution/python/ArrowPythonRunner.scala   |  3 +-
 .../python/FlatMapGroupsInPandasExec.scala |  4 +--
 .../sql/execution/python/WindowInPandasExec.scala  |  2 +-
 .../spark/sql/execution/r/ArrowRRunner.scala   |  3 +-
 .../sql/execution/arrow/ArrowConvertersSuite.scala |  1 +
 .../sources/RateStreamProviderSuite.scala  |  2 +-
 .../streaming/sources/TextSocketStreamSuite.scala  |  2 +-
 .../vectorized/ArrowColumnVectorSuite.scala|  2 +-
 .../execution/vectorized/ColumnarBatchSuite.scala  |  4 +--
 60 files changed, 65 insertions(+), 28 deletions(-)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/SessionConfigSupport.java
 (100%)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/SupportsRead.java (100%)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/SupportsWrite.java 
(100%)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java (89%)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/reader/Batch.java (100%)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/reader/InputPartition.java
 (100%)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/reader/PartitionReader.java
 (100%)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/reader/PartitionReaderFactory.java
 (100%)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/reader/Scan.java (100%)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2

[spark] branch master updated (b312033 -> 18834e8)

2019-06-05 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b312033  [SPARK-20286][CORE] Improve logic for timing out executors in 
dynamic allocation.
 add 18834e8  [SPARK-27899][SQL] Refactor getTableOption() to extract a 
common method

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/hive/client/HiveClientImpl.scala | 206 +++--
 1 file changed, 104 insertions(+), 102 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-20286][CORE] Improve logic for timing out executors in dynamic allocation.

2019-06-05 Thread irashid
This is an automated email from the ASF dual-hosted git repository.

irashid pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b312033  [SPARK-20286][CORE] Improve logic for timing out executors in 
dynamic allocation.
b312033 is described below

commit b312033bd33cd6cbf1f166ccaa7a5df4e3421078
Author: Marcelo Vanzin 
AuthorDate: Wed Jun 5 08:09:44 2019 -0500

[SPARK-20286][CORE] Improve logic for timing out executors in dynamic 
allocation.

This change refactors the portions of the ExecutorAllocationManager class 
that
track executor state into a new class, to achieve a few goals:

- make the code easier to understand
- better separate concerns (task backlog vs. executor state)
- less synchronization between event and allocation threads
- less coupling between the allocation code and executor state tracking

The executor tracking code was moved to a new class (ExecutorMonitor) that
encapsulates all the logic of tracking what happens to executors and when
they can be timed out. The logic to actually remove the executors remains
in the EAM, since it still requires information that is not tracked by the
new executor monitor code.

In the executor monitor itself, of interest, specifically, is a change in
how cached blocks are tracked; instead of polling the block manager, the
monitor now uses events to track which executors have cached blocks, and
is able to detect also unpersist events and adjust the time when the 
executor
should be removed accordingly. (That's the bug mentioned in the PR title.)

Because of the refactoring, a few tests in the old EAM test suite were 
removed,
since they're now covered by the newly added test suite. The EAM suite was
also changed a little bit to not instantiate a SparkContext every time. This
allowed some cleanup, and the tests also run faster.

Tested with new and updated unit tests, and with multiple TPC-DS workloads
running with dynamic allocation on; also some manual tests for the caching
behavior.

Closes #24704 from vanzin/SPARK-20286.

Authored-by: Marcelo Vanzin 
Signed-off-by: Imran Rashid 
---
 .../apache/spark/ExecutorAllocationClient.scala|   9 +-
 .../apache/spark/ExecutorAllocationManager.scala   | 239 +--
 .../main/scala/org/apache/spark/SparkContext.scala |   3 +-
 .../org/apache/spark/internal/config/package.scala |   8 +-
 .../cluster/CoarseGrainedSchedulerBackend.scala|   4 +
 .../spark/scheduler/dynalloc/ExecutorMonitor.scala | 283 
 .../apache/spark/storage/BlockManagerMaster.scala  |   9 -
 .../spark/storage/BlockManagerMasterEndpoint.scala |  31 -
 .../spark/storage/BlockManagerMessages.scala   |   3 -
 ...g.apache.spark.scheduler.ExternalClusterManager |   1 -
 .../spark/ExecutorAllocationManagerSuite.scala | 708 ++---
 .../deploy/ExternalShuffleServiceDbSuite.scala |   9 +-
 .../scheduler/dynalloc/ExecutorMonitorSuite.scala  | 289 +
 .../spark/storage/BlockManagerInfoSuite.scala  |  14 -
 .../scheduler/ExecutorAllocationManagerSuite.scala |   2 +-
 15 files changed, 822 insertions(+), 790 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala 
b/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala
index 63d87b4..cb965cb 100644
--- a/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala
+++ b/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala
@@ -23,11 +23,18 @@ package org.apache.spark
  */
 private[spark] trait ExecutorAllocationClient {
 
-
   /** Get the list of currently active executors */
   private[spark] def getExecutorIds(): Seq[String]
 
   /**
+   * Whether an executor is active. An executor is active when it can be used 
to execute tasks
+   * for jobs submitted by the application.
+   *
+   * @return whether the executor with the given ID is currently active.
+   */
+  def isExecutorActive(id: String): Boolean
+
+  /**
* Update the cluster manager on our scheduling needs. Three bits of 
information are included
* to help it make decisions.
* @param numExecutors The total number of executors we'd like to have. The 
cluster manager
diff --git 
a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala 
b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
index 1782027..63df7cc 100644
--- a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
+++ b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
@@ -30,7 +30,7 @@ import org.apache.spark.internal.config._
 import org.apache.spark.internal.config.Tests.TEST_SCHEDULE_INTERVAL
 import org.apache.spark.metrics.source.Source
 import org.apache.spark.scheduler._
-import org.apache.spark.storage.BlockManagerMaster
+i