date:20170119

spark git commit: [SPARK-19265][SQL] make table relation cache general and does not depend on hive

2017-01-19 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/master 0c9231858 -> 2e6256002


[SPARK-19265][SQL] make table relation cache general and does not depend on hive

## What changes were proposed in this pull request?

We have a table relation plan cache in `HiveMetastoreCatalog`, which caches a 
lot of things: file status, resolved data source, inferred schema, etc.

However, it doesn't make sense to limit this cache with hive support, we should 
move it to SQL core module so that users can use this cache without hive 
support.

It can also reduce the size of `HiveMetastoreCatalog`, so that it's easier to 
remove it eventually.

main changes:
1. move the table relation cache to `SessionCatalog`
2. `SessionCatalog.lookupRelation` will return `SimpleCatalogRelation` and the 
analyzer will convert it to `LogicalRelation` or `MetastoreRelation` later, 
then `HiveSessionCatalog` doesn't need to override `lookupRelation` anymore
3. `FindDataSourceTable` will read/write the table relation cache.

## How was this patch tested?

existing tests.

Author: Wenchen Fan 

Closes #16621 from cloud-fan/plan-cache.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2e625600
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2e625600
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2e625600

Branch: refs/heads/master
Commit: 2e62560024999c215cf2373fc9a8070bb2ad5c58
Parents: 0c92318
Author: Wenchen Fan 
Authored: Thu Jan 19 00:07:48 2017 -0800
Committer: gatorsmile 
Committed: Thu Jan 19 00:07:48 2017 -0800

--
 .../sql/catalyst/catalog/SessionCatalog.scala   | 33 +--
 .../apache/spark/sql/catalyst/identifiers.scala |  4 +-
 .../catalyst/catalog/SessionCatalogSuite.scala  |  6 +-
 .../org/apache/spark/sql/DataFrameWriter.scala  |  4 +-
 .../command/AnalyzeColumnCommand.scala  |  3 +-
 .../execution/command/AnalyzeTableCommand.scala |  3 +-
 .../spark/sql/execution/command/tables.scala|  2 +-
 .../datasources/DataSourceStrategy.scala| 60 ++--
 .../apache/spark/sql/internal/CatalogImpl.scala |  2 +-
 .../org/apache/spark/sql/DataFrameSuite.scala   | 11 ---
 .../spark/sql/execution/command/DDLSuite.scala  |  3 +-
 .../spark/sql/hive/HiveMetastoreCatalog.scala   | 98 ++--
 .../spark/sql/hive/HiveSessionCatalog.scala | 37 +---
 .../spark/sql/hive/HiveSessionState.scala   |  2 +
 .../apache/spark/sql/hive/HiveStrategies.scala  | 17 +++-
 .../CreateHiveTableAsSelectCommand.scala|  8 +-
 .../apache/spark/sql/hive/test/TestHive.scala   |  2 +-
 .../sql/hive/MetastoreDataSourcesSuite.scala| 22 +
 .../apache/spark/sql/hive/StatisticsSuite.scala |  8 +-
 .../sql/hive/execution/SQLQuerySuite.scala  | 17 ++--
 20 files changed, 144 insertions(+), 198 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2e625600/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index 8008fcd..e9543f7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@@ -21,13 +21,13 @@ import javax.annotation.concurrent.GuardedBy
 
 import scala.collection.mutable
 
+import com.google.common.cache.{Cache, CacheBuilder}
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.Path
 
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.AnalysisException
-import org.apache.spark.sql.catalyst.{CatalystConf, SimpleCatalystConf}
-import org.apache.spark.sql.catalyst.{FunctionIdentifier, TableIdentifier}
+import org.apache.spark.sql.catalyst._
 import org.apache.spark.sql.catalyst.analysis._
 import org.apache.spark.sql.catalyst.analysis.FunctionRegistry.FunctionBuilder
 import org.apache.spark.sql.catalyst.expressions.{Expression, ExpressionInfo}
@@ -118,6 +118,14 @@ class SessionCatalog(
   }
 
   /**
+   * A cache of qualified table name to table relation plan.
+   */
+  val tableRelationCache: Cache[QualifiedTableName, LogicalPlan] = {
+// TODO: create a config instead of hardcode 1000 here.
+CacheBuilder.newBuilder().maximumSize(1000).build[QualifiedTableName, 
LogicalPlan]()
+  }
+
+  /**
* This method is used to make the given path qualified before we
* store this path in the underlying external catalog. So, when a path
* does not contain a scheme, this path will not be changed after the default
@@ -573,7 +581,7 @@ class SessionCatalog(
   val relationAlias = alias.getOrEls

spark-website git commit: [SPARK-19249][DOC] Update download page to describe archive location

2017-01-19 Thread srowen

Repository: spark-website
Updated Branches:
  refs/heads/asf-site 03485ecc8 -> 0fce54176


[SPARK-19249][DOC] Update download page to describe archive location


Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/0fce5417
Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/0fce5417
Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/0fce5417

Branch: refs/heads/asf-site
Commit: 0fce54176020f0a1adcc704cdd2d47e4415ebbb4
Parents: 03485ec
Author: Luciano Resende 
Authored: Mon Jan 16 11:08:59 2017 -0800
Committer: Luciano Resende 
Committed: Mon Jan 16 20:34:40 2017 -0800

--
 downloads.md| 4 
 site/downloads.html | 6 +-
 2 files changed, 9 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark-website/blob/0fce5417/downloads.md
--
diff --git a/downloads.md b/downloads.md
index cfbc6e4..3654d93 100644
--- a/downloads.md
+++ b/downloads.md
@@ -68,6 +68,10 @@ Once you've downloaded Spark, you can find instructions for 
installing and build
 
 
 
+### Archived Releases
+
+As new Spark releases come out for each development stream, previous ones will 
be archived, but they are still available at [Spark release 
archives](https://archive.apache.org/dist/spark/).
+
 ### Nightly Packages and Artifacts
 For developers, Spark maintains nightly builds and SNAPSHOT artifacts. More 
information is available on the [the Developer Tools 
page](/developer-tools.html#nightly-builds).
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0fce5417/site/downloads.html
--
diff --git a/site/downloads.html b/site/downloads.html
index 24394ab..343751d 100644
--- a/site/downloads.html
+++ b/site/downloads.html
@@ -199,7 +199,7 @@ $(document).ready(function() {
 });
 
 
-Download Apache Spark™
+Download Apache Spark™
 
 
   
@@ -263,6 +263,10 @@ git clone git://github.com/apache/spark.git -b branch-2.1
 
 
 
+Archived Releases
+
+As new Spark releases come out for each development stream, previous ones 
will be archived, but they are still available at https://archive.apache.org/dist/spark/";>Spark release archives.
+
 Nightly Packages and Artifacts
 For developers, Spark maintains nightly builds and SNAPSHOT artifacts. More 
information is available on the the Developer Tools page.
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-14272][ML] Add Loglikelihood in GaussianMixtureSummary

2017-01-19 Thread yliang

Repository: spark
Updated Branches:
  refs/heads/master 2e6256002 -> 8ccca9170


[SPARK-14272][ML] Add Loglikelihood in GaussianMixtureSummary

## What changes were proposed in this pull request?

add loglikelihood in GMM.summary

## How was this patch tested?

added tests

Author: Zheng RuiFeng 
Author: Ruifeng Zheng 

Closes #12064 from zhengruifeng/gmm_metric.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8ccca917
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8ccca917
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8ccca917

Branch: refs/heads/master
Commit: 8ccca9170f983f74a7482f67206dae070c77b419
Parents: 2e62560
Author: Zheng RuiFeng 
Authored: Thu Jan 19 03:46:37 2017 -0800
Committer: Yanbo Liang 
Committed: Thu Jan 19 03:46:37 2017 -0800

--
 .../org/apache/spark/ml/clustering/GaussianMixture.scala  |  7 +--
 .../main/scala/org/apache/spark/mllib/util/MLUtils.scala  |  2 +-
 .../apache/spark/ml/clustering/GaussianMixtureSuite.scala |  7 +++
 project/MimaExcludes.scala|  5 -
 python/pyspark/ml/clustering.py   | 10 ++
 5 files changed, 27 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8ccca917/mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala 
b/mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala
index a7bb413..db5fff5 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala
@@ -416,7 +416,7 @@ class GaussianMixture @Since("2.0.0") (
 
 val model = copyValues(new GaussianMixtureModel(uid, weights, 
gaussianDists)).setParent(this)
 val summary = new GaussianMixtureSummary(model.transform(dataset),
-  $(predictionCol), $(probabilityCol), $(featuresCol), $(k))
+  $(predictionCol), $(probabilityCol), $(featuresCol), $(k), logLikelihood)
 model.setSummary(Some(summary))
 instr.logSuccess(model)
 model
@@ -674,6 +674,7 @@ private class ExpectationAggregator(
  *in `predictions`.
  * @param featuresCol  Name for column of features in `predictions`.
  * @param k  Number of clusters.
+ * @param logLikelihood  Total log-likelihood for this model on the given data.
  */
 @Since("2.0.0")
 @Experimental
@@ -682,7 +683,9 @@ class GaussianMixtureSummary private[clustering] (
 predictionCol: String,
 @Since("2.0.0") val probabilityCol: String,
 featuresCol: String,
-k: Int) extends ClusteringSummary(predictions, predictionCol, featuresCol, 
k) {
+k: Int,
+@Since("2.2.0") val logLikelihood: Double)
+  extends ClusteringSummary(predictions, predictionCol, featuresCol, k) {
 
   /**
* Probability of each cluster.

http://git-wip-us.apache.org/repos/asf/spark/blob/8ccca917/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
index de66c7c..95f904d 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
@@ -34,7 +34,7 @@ import org.apache.spark.storage.StorageLevel
 import org.apache.spark.util.random.BernoulliCellSampler
 
 /**
- * Helper methods to load, save and pre-process data used in ML Lib.
+ * Helper methods to load, save and pre-process data used in MLLib.
  */
 @Since("0.8.0")
 object MLUtils extends Logging {

http://git-wip-us.apache.org/repos/asf/spark/blob/8ccca917/mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala
--
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala
index a362aee..e54eb27 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala
@@ -207,6 +207,10 @@ class GaussianMixtureSuite extends SparkFunSuite with 
MLlibTestSparkContext
 [,1] [,2]
   [1,] 0.2961543 0.160783
   [2,] 0.1607830 1.008878
+
+  model$loglik
+
+  [1] -46.89499
  */
 val weights = Array(0.533, 0.467)
 val means = Array(Vectors.dense(10.363673, 9.897081), 
Vectors.dense(0.11731091, -0.06192351))
@@ -219,6

spark git commit: [SPARK-19059][SQL] Unable to retrieve data from parquet table whose name startswith underscore

2017-01-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 8ccca9170 -> 064fadd2a


[SPARK-19059][SQL] Unable to retrieve data from parquet table whose name 
startswith underscore

## What changes were proposed in this pull request?
The initial shouldFilterOut() method invocation filter the root path name(table 
name in the intial call) and remove if it contains _. I moved the check one 
level below, so it first list files/directories in the given root path and then 
apply filter.
(Please fill in changes proposed in this fix)

## How was this patch tested?
Added new test case for this scenario
(Please explain how this patch was tested. E.g. unit tests, integration tests, 
manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Author: jayadevanmurali 
Author: jayadevan 

Closes #16635 from jayadevanmurali/branch-0.1-SPARK-19059.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/064fadd2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/064fadd2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/064fadd2

Branch: refs/heads/master
Commit: 064fadd2a25d1c118e062e505a0ed56be31bdf34
Parents: 8ccca91
Author: jayadevanmurali 
Authored: Thu Jan 19 20:07:52 2017 +0800
Committer: Wenchen Fan 
Committed: Thu Jan 19 20:07:52 2017 +0800

--
 .../PartitioningAwareFileIndex.scala| 91 ++--
 .../org/apache/spark/sql/SQLQuerySuite.scala|  8 ++
 2 files changed, 53 insertions(+), 46 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/064fadd2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala
index 82c1599..fe9c657 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala
@@ -385,55 +385,54 @@ object PartitioningAwareFileIndex extends Logging {
 logTrace(s"Listing $path")
 val fs = path.getFileSystem(hadoopConf)
 val name = path.getName.toLowerCase
-if (shouldFilterOut(name)) {
-  Seq.empty[FileStatus]
-} else {
-  // [SPARK-17599] Prevent InMemoryFileIndex from failing if path doesn't 
exist
-  // Note that statuses only include FileStatus for the files and dirs 
directly under path,
-  // and does not include anything else recursively.
-  val statuses = try fs.listStatus(path) catch {
-case _: FileNotFoundException =>
-  logWarning(s"The directory $path was not found. Was it deleted very 
recently?")
-  Array.empty[FileStatus]
-  }
 
-  val allLeafStatuses = {
-val (dirs, topLevelFiles) = statuses.partition(_.isDirectory)
-val nestedFiles: Seq[FileStatus] = sessionOpt match {
-  case Some(session) =>
-bulkListLeafFiles(dirs.map(_.getPath), hadoopConf, filter, 
session).flatMap(_._2)
-  case _ =>
-dirs.flatMap(dir => listLeafFiles(dir.getPath, hadoopConf, filter, 
sessionOpt))
-}
-val allFiles = topLevelFiles ++ nestedFiles
-if (filter != null) allFiles.filter(f => filter.accept(f.getPath)) 
else allFiles
-  }
+// [SPARK-17599] Prevent InMemoryFileIndex from failing if path doesn't 
exist
+// Note that statuses only include FileStatus for the files and dirs 
directly under path,
+// and does not include anything else recursively.
+val statuses = try fs.listStatus(path) catch {
+  case _: FileNotFoundException =>
+logWarning(s"The directory $path was not found. Was it deleted very 
recently?")
+Array.empty[FileStatus]
+}
 
-  allLeafStatuses.filterNot(status => 
shouldFilterOut(status.getPath.getName)).map {
-case f: LocatedFileStatus =>
-  f
-
-// NOTE:
-//
-// - Although S3/S3A/S3N file system can be quite slow for remote file 
metadata
-//   operations, calling `getFileBlockLocations` does no harm here 
since these file system
-//   implementations don't actually issue RPC for this method.
-//
-// - Here we are calling `getFileBlockLocations` in a sequential 
manner, but it should not
-//   be a big deal since we always use to `listLeafFilesInParallel` 
when the number of
-//   paths exceeds threshold.
-case f =>
-  // The other constructor

[4/4] spark git commit: [SPARK-16654][CORE] Add UI coverage for Application Level Blacklisting

2017-01-19 Thread irashid

[SPARK-16654][CORE] Add UI coverage for Application Level Blacklisting

Builds on top of work in SPARK-8425 to update Application Level Blacklisting in 
the scheduler.

## What changes were proposed in this pull request?

Adds a UI to these patches by:
- defining new listener events for blacklisting and unblacklisting, nodes and 
executors;
- sending said events at the relevant points in BlacklistTracker;
- adding JSON (de)serialization code for these events;
- augmenting the Executors UI page to show which, and how many, executors are 
blacklisted;
- adding a unit test to make sure events are being fired;
- adding HistoryServerSuite coverage to verify that the SHS reads these events 
correctly.
- updates the Executor UI to show Blacklisted/Active/Dead as a tri-state in 
Executors Status

Updates .rat-excludes to pass tests.

username squito

## How was this patch tested?

./dev/run-tests
testOnly org.apache.spark.util.JsonProtocolSuite
testOnly org.apache.spark.scheduler.BlacklistTrackerSuite
testOnly org.apache.spark.deploy.history.HistoryServerSuite
https://github.com/jsoltren/jose-utils/blob/master/blacklist/test-blacklist.sh
![blacklist-20161219](https://cloud.githubusercontent.com/assets/1208477/21335321/9eda320a-c623-11e6-8b8c-9c912a73c276.jpg)

Author: JosÃ© Hiram Soltren 

Closes #16346 from jsoltren/SPARK-16654-submit.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/640f9423
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/640f9423
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/640f9423

Branch: refs/heads/master
Commit: 640f942337e1ce87075195998bd051e19c4b50b9
Parents: 064fadd
Author: JosÃ© Hiram Soltren 
Authored: Thu Jan 19 09:08:18 2017 -0600
Committer: Imran Rashid 
Committed: Thu Jan 19 09:08:18 2017 -0600

--
 .../org/apache/spark/SparkFirehoseListener.java |  20 
 .../spark/ui/static/executorspage-template.html |   5 +
 .../org/apache/spark/ui/static/executorspage.js |  39 --
 .../spark/scheduler/BlacklistTracker.scala  |  16 ++-
 .../spark/scheduler/EventLoggingListener.scala  |  16 +++
 .../apache/spark/scheduler/SparkListener.scala  |  54 +
 .../spark/scheduler/SparkListenerBus.scala  |   8 ++
 .../spark/scheduler/TaskSchedulerImpl.scala |  10 +-
 .../org/apache/spark/status/api/v1/api.scala|   1 +
 .../scala/org/apache/spark/ui/ToolTips.scala|   3 +
 .../apache/spark/ui/exec/ExecutorsPage.scala|   2 +
 .../org/apache/spark/ui/exec/ExecutorsTab.scala |  57 -
 .../apache/spark/ui/jobs/ExecutorTable.scala|   6 +
 .../scala/org/apache/spark/ui/jobs/UIData.scala |   2 +
 .../application_list_json_expectation.json  |  44 +--
 .../completed_app_list_json_expectation.json|  44 +--
 .../executor_list_json_expectation.json |   1 +
 .../executor_node_blacklisting_expectation.json | 118 +++
 ...blacklisting_unblacklisting_expectation.json | 118 +++
 .../limit_app_list_json_expectation.json|  64 --
 .../maxDate2_app_list_json_expectation.json |   2 +-
 .../maxDate_app_list_json_expectation.json  |   4 +-
 .../minDate_app_list_json_expectation.json  |  40 ++-
 .../one_app_json_expectation.json   |   2 +-
 .../one_app_multi_attempt_json_expectation.json |   4 +-
 .../one_stage_attempt_json_expectation.json |  80 ++---
 .../one_stage_json_expectation.json |  80 ++---
 ...summary_w__custom_quantiles_expectation.json |   2 +-
 ...stage_with_accumulable_json_expectation.json | 100 
 .../spark-events/app-20161115172038-|  75 
 .../spark-events/app-20161116163331-|  68 +++
 .../deploy/history/HistoryServerSuite.scala |   4 +-
 .../spark/scheduler/BlacklistTrackerSuite.scala |  42 ++-
 .../spark/scheduler/TaskSetManagerSuite.scala   |   3 +-
 .../apache/spark/util/JsonProtocolSuite.scala   |  49 
 dev/.rat-excludes   |   2 +
 36 files changed, 950 insertions(+), 235 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/640f9423/core/src/main/java/org/apache/spark/SparkFirehoseListener.java
--
diff --git a/core/src/main/java/org/apache/spark/SparkFirehoseListener.java 
b/core/src/main/java/org/apache/spark/SparkFirehoseListener.java
index 97eed61..9fe97b4 100644
--- a/core/src/main/java/org/apache/spark/SparkFirehoseListener.java
+++ b/core/src/main/java/org/apache/spark/SparkFirehoseListener.java
@@ -114,6 +114,26 @@ public class SparkFirehoseListener implements 
SparkListenerInterface {
 }
 
 @Override
+public final void onExecutorBlacklisted(SparkListenerExecutorBlacklisted 
executorBlacklisted) {
+on

[2/4] spark git commit: [SPARK-16654][CORE] Add UI coverage for Application Level Blacklisting

2017-01-19 Thread irashid

http://git-wip-us.apache.org/repos/asf/spark/blob/640f9423/core/src/test/resources/spark-events/app-20161116163331-
--
diff --git a/core/src/test/resources/spark-events/app-20161116163331- 
b/core/src/test/resources/spark-events/app-20161116163331-
new file mode 100755
index 000..7566c9f
--- /dev/null
+++ b/core/src/test/resources/spark-events/app-20161116163331-
@@ -0,0 +1,68 @@
+{"Event":"SparkListenerLogStart","Spark Version":"2.1.0-SNAPSHOT"}
+{"Event":"SparkListenerBlockManagerAdded","Block Manager ID":{"Executor 
ID":"driver","Host":"172.22.0.167","Port":51475},"Maximum 
Memory":384093388,"Timestamp":1479335611477}
+{"Event":"SparkListenerEnvironmentUpdate","JVM Information":{"Java 
Home":"/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre","Java
 Version":"1.8.0_92 (Oracle Corporation)","Scala Version":"version 
2.11.8"},"Spark 
Properties":{"spark.blacklist.task.maxTaskAttemptsPerExecutor":"3","spark.blacklist.enabled":"TRUE","spark.driver.host":"172.22.0.167","spark.blacklist.task.maxTaskAttemptsPerNode":"3","spark.eventLog.enabled":"TRUE","spark.driver.port":"51459","spark.repl.class.uri":"spark://172.22.0.167:51459/classes","spark.jars":"","spark.repl.class.outputDir":"/private/var/folders/l4/d46wlzj16593f3d812vk49twgp/T/spark-1cbc97d0-7fe6-4c9f-8c2c-f6fe51ee3cf2/repl-39929169-ac4c-4c6d-b116-f648e4dd62ed","spark.app.name":"Spark
 
shell","spark.blacklist.stage.maxFailedExecutorsPerNode":"3","spark.scheduler.mode":"FIFO","spark.eventLog.overwrite":"TRUE","spark.blacklist.stage.maxFailedTasksPerExecutor":"3","spark.executor.id":"driver","spark.blacklist.application.maxFailedEx
 
ecutorsPerNode":"2","spark.submit.deployMode":"client","spark.master":"local-cluster[4,4,1024]","spark.home":"/Users/Jose/IdeaProjects/spark","spark.eventLog.dir":"/Users/jose/logs","spark.sql.catalogImplementation":"in-memory","spark.eventLog.compress":"FALSE","spark.blacklist.application.maxFailedTasksPerExecutor":"1","spark.blacklist.timeout":"100","spark.app.id":"app-20161116163331-","spark.task.maxFailures":"4"},"System
 
Properties":{"java.io.tmpdir":"/var/folders/l4/d46wlzj16593f3d812vk49twgp/T/","line.separator":"\n","path.separator":":","sun.management.compiler":"HotSpot
 64-Bit Tiered 
Compilers","SPARK_SUBMIT":"true","sun.cpu.endian":"little","java.specification.version":"1.8","java.vm.specification.name":"Java
 Virtual Machine Specification","java.vendor":"Oracle 
Corporation","java.vm.specification.version":"1.8","user.home":"/Users/Jose","file.encoding.pkg":"sun.io","sun.nio.ch.bugLevel":"","ftp.nonProxyHosts":"local|*.local|169.254/16|*.169.254/16","sun.arch.dat
 
a.model":"64","sun.boot.library.path":"/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib","user.dir":"/Users/Jose/IdeaProjects/spark","java.library.path":"/Users/Jose/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.","sun.cpu.isalist":"","os.arch":"x86_64","java.vm.version":"25.92-b14","java.endorsed.dirs":"/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/endorsed","java.runtime.version":"1.8.0_92-b14","java.vm.info":"mixed
 
mode","java.ext.dirs":"/Users/Jose/Library/Java/Extensions:/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/ext:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java","java.runtime.name":"Java(TM)
 SE Runtime 
Environment","file.separator":"/","io.netty.maxDirectMemory":"0","java.class.version":"52.0","scala.usejavacp":"true","java.specification.name":"Java
 Platform API
  
Specification","sun.boot.class.path":"/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/sunrsasign.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/classes","file.encoding":"UTF-8","user.timezone":"America/Chicago","java.specification.vendor":"Oracle
 
Corporation","sun.java.launcher":"SUN_STANDARD","os.version":"10.11.6","sun.os.patch.level":"unknown","gopherProxySet":"false","java.vm.specification.vendor":"Oracle
 Corporation","user.country":"US","sun.jnu.e
 
ncoding":"UTF-8","http.nonProxyHosts":"local|*.local|169.254/16|*.169.254/16","user.language":"en","socksNonProxyHosts":"local|*.local|169.254/16|*.169.254/16","java.vendor.url":"http://java.oracle.com/","java.awt.printerjob":"sun.lwawt.macosx.CPrinterJob","jav

[1/4] spark git commit: [SPARK-16654][CORE] Add UI coverage for Application Level Blacklisting

2017-01-19 Thread irashid

Repository: spark
Updated Branches:
  refs/heads/master 064fadd2a -> 640f94233


http://git-wip-us.apache.org/repos/asf/spark/blob/640f9423/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala 
b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala
index d3b79dd..cd3959a 100644
--- 
a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala
@@ -143,7 +143,9 @@ class HistoryServerSuite extends SparkFunSuite with 
BeforeAndAfter with Matchers
 "stage task list from multi-attempt app json(2)" ->
   "applications/local-1426533911241/2/stages/0/0/taskList",
 
-"rdd list storage json" -> "applications/local-1422981780767/storage/rdd"
+"rdd list storage json" -> "applications/local-1422981780767/storage/rdd",
+"executor node blacklisting" -> 
"applications/app-20161116163331-/executors",
+"executor node blacklisting unblacklisting" -> 
"applications/app-20161115172038-/executors"
 // Todo: enable this test when logging the even of onBlockUpdated. See: 
SPARK-13845
 // "one rdd storage json" -> 
"applications/local-1422981780767/storage/rdd/0"
   )

http://git-wip-us.apache.org/repos/asf/spark/blob/640f9423/core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala
index 6b314d2..ead6955 100644
--- a/core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala
@@ -17,7 +17,7 @@
 
 package org.apache.spark.scheduler
 
-import org.mockito.Mockito.when
+import org.mockito.Mockito.{verify, when}
 import org.scalatest.BeforeAndAfterEach
 import org.scalatest.mock.MockitoSugar
 
@@ -31,6 +31,7 @@ class BlacklistTrackerSuite extends SparkFunSuite with 
BeforeAndAfterEach with M
   private val clock = new ManualClock(0)
 
   private var blacklist: BlacklistTracker = _
+  private var listenerBusMock: LiveListenerBus = _
   private var scheduler: TaskSchedulerImpl = _
   private var conf: SparkConf = _
 
@@ -40,7 +41,9 @@ class BlacklistTrackerSuite extends SparkFunSuite with 
BeforeAndAfterEach with M
 scheduler = mockTaskSchedWithConf(conf)
 
 clock.setTime(0)
-blacklist = new BlacklistTracker(conf, clock)
+
+listenerBusMock = mock[LiveListenerBus]
+blacklist = new BlacklistTracker(listenerBusMock, conf, clock)
   }
 
   override def afterEach(): Unit = {
@@ -112,6 +115,8 @@ class BlacklistTrackerSuite extends SparkFunSuite with 
BeforeAndAfterEach with M
 assertEquivalentToSet(blacklist.isExecutorBlacklisted(_), Set())
   } else {
 assertEquivalentToSet(blacklist.isExecutorBlacklisted(_), Set("1"))
+verify(listenerBusMock).post(
+  SparkListenerExecutorBlacklisted(0, "1", failuresUntilBlacklisted))
   }
 }
   }
@@ -147,6 +152,7 @@ class BlacklistTrackerSuite extends SparkFunSuite with 
BeforeAndAfterEach with M
 // and it should be blacklisted for the entire application.
 blacklist.updateBlacklistForSuccessfulTaskSet(0, 0, 
taskSetBlacklist.execToFailures)
 assertEquivalentToSet(blacklist.isExecutorBlacklisted(_), Set("1"))
+verify(listenerBusMock).post(SparkListenerExecutorBlacklisted(0, "1", 
numFailures))
   } else {
 // The task set failed, so we don't count these failures against the 
executor for other
 // stages.
@@ -166,6 +172,7 @@ class BlacklistTrackerSuite extends SparkFunSuite with 
BeforeAndAfterEach with M
 assert(blacklist.nodeBlacklist() === Set())
 assertEquivalentToSet(blacklist.isNodeBlacklisted(_), Set())
 assertEquivalentToSet(blacklist.isExecutorBlacklisted(_), Set("1"))
+verify(listenerBusMock).post(SparkListenerExecutorBlacklisted(0, "1", 4))
 
 val taskSetBlacklist1 = createTaskSetBlacklist(stageId = 1)
 // Fail 4 tasks in one task set on executor 2, so that executor gets 
blacklisted for the whole
@@ -177,15 +184,21 @@ class BlacklistTrackerSuite extends SparkFunSuite with 
BeforeAndAfterEach with M
 blacklist.updateBlacklistForSuccessfulTaskSet(0, 0, 
taskSetBlacklist1.execToFailures)
 assert(blacklist.nodeBlacklist() === Set("hostA"))
 assertEquivalentToSet(blacklist.isNodeBlacklisted(_), Set("hostA"))
+verify(listenerBusMock).post(SparkListenerNodeBlacklisted(0, "hostA", 2))
 assertEquivalentToSet(blacklist.isExecutorBlacklisted(_), Set("1", "2"))
+verify(listenerBusMock).post(SparkListenerExecutorBlacklisted(0, "2", 4))
 
 // Advance the

[3/4] spark git commit: [SPARK-16654][CORE] Add UI coverage for Application Level Blacklisting

2017-01-19 Thread irashid

http://git-wip-us.apache.org/repos/asf/spark/blob/640f9423/core/src/test/resources/spark-events/app-20161115172038-
--
diff --git a/core/src/test/resources/spark-events/app-20161115172038- 
b/core/src/test/resources/spark-events/app-20161115172038-
new file mode 100755
index 000..3af0451
--- /dev/null
+++ b/core/src/test/resources/spark-events/app-20161115172038-
@@ -0,0 +1,75 @@
+{"Event":"SparkListenerLogStart","Spark Version":"2.1.0-SNAPSHOT"}
+{"Event":"SparkListenerBlockManagerAdded","Block Manager ID":{"Executor 
ID":"driver","Host":"172.22.0.111","Port":64527},"Maximum 
Memory":384093388,"Timestamp":1479252038836}
+{"Event":"SparkListenerEnvironmentUpdate","JVM Information":{"Java 
Home":"/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre","Java
 Version":"1.8.0_92 (Oracle Corporation)","Scala Version":"version 
2.11.8"},"Spark 
Properties":{"spark.blacklist.task.maxTaskAttemptsPerExecutor":"3","spark.blacklist.enabled":"TRUE","spark.driver.host":"172.22.0.111","spark.blacklist.task.maxTaskAttemptsPerNode":"3","spark.eventLog.enabled":"TRUE","spark.driver.port":"64511","spark.repl.class.uri":"spark://172.22.0.111:64511/classes","spark.jars":"","spark.repl.class.outputDir":"/private/var/folders/l4/d46wlzj16593f3d812vk49twgp/T/spark-f09ef9e2-7f15-433f-a5d1-30138d8764ca/repl-28d60911-dbc3-465f-b7b3-ee55c071595e","spark.app.name":"Spark
 
shell","spark.blacklist.stage.maxFailedExecutorsPerNode":"3","spark.scheduler.mode":"FIFO","spark.eventLog.overwrite":"TRUE","spark.blacklist.stage.maxFailedTasksPerExecutor":"3","spark.executor.id":"driver","spark.blacklist.application.maxFailedEx
 
ecutorsPerNode":"2","spark.submit.deployMode":"client","spark.master":"local-cluster[4,4,1024]","spark.home":"/Users/Jose/IdeaProjects/spark","spark.eventLog.dir":"/Users/jose/logs","spark.sql.catalogImplementation":"in-memory","spark.eventLog.compress":"FALSE","spark.blacklist.application.maxFailedTasksPerExecutor":"1","spark.blacklist.timeout":"1","spark.app.id":"app-20161115172038-","spark.task.maxFailures":"4"},"System
 
Properties":{"java.io.tmpdir":"/var/folders/l4/d46wlzj16593f3d812vk49twgp/T/","line.separator":"\n","path.separator":":","sun.management.compiler":"HotSpot
 64-Bit Tiered 
Compilers","SPARK_SUBMIT":"true","sun.cpu.endian":"little","java.specification.version":"1.8","java.vm.specification.name":"Java
 Virtual Machine Specification","java.vendor":"Oracle 
Corporation","java.vm.specification.version":"1.8","user.home":"/Users/Jose","file.encoding.pkg":"sun.io","sun.nio.ch.bugLevel":"","ftp.nonProxyHosts":"local|*.local|169.254/16|*.169.254/16","sun.arch.data.
 
model":"64","sun.boot.library.path":"/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib","user.dir":"/Users/Jose/IdeaProjects/spark","java.library.path":"/Users/Jose/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.","sun.cpu.isalist":"","os.arch":"x86_64","java.vm.version":"25.92-b14","java.endorsed.dirs":"/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/endorsed","java.runtime.version":"1.8.0_92-b14","java.vm.info":"mixed
 
mode","java.ext.dirs":"/Users/Jose/Library/Java/Extensions:/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/ext:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java","java.runtime.name":"Java(TM)
 SE Runtime 
Environment","file.separator":"/","io.netty.maxDirectMemory":"0","java.class.version":"52.0","scala.usejavacp":"true","java.specification.name":"Java
 Platform API S
 
pecification","sun.boot.class.path":"/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/sunrsasign.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/jre/classes","file.encoding":"UTF-8","user.timezone":"America/Chicago","java.specification.vendor":"Oracle
 
Corporation","sun.java.launcher":"SUN_STANDARD","os.version":"10.11.6","sun.os.patch.level":"unknown","gopherProxySet":"false","java.vm.specification.vendor":"Oracle
 Corporation","user.country":"US","sun.jnu.enc
 
oding":"UTF-8","http.nonProxyHosts":"local|*.local|169.254/16|*.169.254/16","user.language":"en","socksNonProxyHosts":"local|*.local|169.254/16|*.169.254/16","java.vendor.url":"http://java.oracle.com/","java.awt.printerjob":"sun.lwawt.macosx.CPrinterJob","java.

spark git commit: [SPARK-19295][SQL] IsolatedClientLoader's downloadVersion should log the location of downloaded metastore client jars

2017-01-19 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master 640f94233 -> 63d839028


[SPARK-19295][SQL] IsolatedClientLoader's downloadVersion should log the 
location of downloaded metastore client jars

## What changes were proposed in this pull request?
This will help the users to know the location of those downloaded jars when 
`spark.sql.hive.metastore.jars` is set to `maven`.

## How was this patch tested?
jenkins

Author: Yin Huai 

Closes #16649 from yhuai/SPARK-19295.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/63d83902
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/63d83902
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/63d83902

Branch: refs/heads/master
Commit: 63d839028a6e03644febc360519fa8e01c5534cf
Parents: 640f942
Author: Yin Huai 
Authored: Thu Jan 19 14:23:36 2017 -0800
Committer: Yin Huai 
Committed: Thu Jan 19 14:23:36 2017 -0800

--
 .../org/apache/spark/sql/hive/client/IsolatedClientLoader.scala | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/63d83902/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
index 26b2de8..63fdd6b 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
@@ -122,6 +122,7 @@ private[hive] object IsolatedClientLoader extends Logging {
 // TODO: Remove copy logic.
 val tempDir = Utils.createTempDir(namePrefix = s"hive-${version}")
 allFiles.foreach(f => FileUtils.copyFileToDirectory(f, tempDir))
+logInfo(s"Downloaded metastore jars to ${tempDir.getCanonicalPath}")
 tempDir.listFiles().map(_.toURI.toURL)
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17912] [SQL] Refactor code generation to get data for ColumnVector/ColumnarBatch

2017-01-19 Thread davies

Repository: spark
Updated Branches:
  refs/heads/master 63d839028 -> 148a84b37


[SPARK-17912] [SQL] Refactor code generation to get data for 
ColumnVector/ColumnarBatch

## What changes were proposed in this pull request?

This PR refactors the code generation part to get data from `ColumnarVector` 
and `ColumnarBatch` by using a trait `ColumnarBatchScan` for ease of reuse. 
This is because this part will be reused by several components (e.g. parquet 
reader, Dataset.cache, and others) since `ColumnarBatch` will be first citizen.

This PR is a part of https://github.com/apache/spark/pull/15219. In advance, 
this PR makes the code generation for  `ColumnarVector` and `ColumnarBatch` 
reuseable as a trait. In general, this is very useful for other components from 
the reuseability view, too.
## How was this patch tested?

tested existing test suites

Author: Kazuaki Ishizaki 

Closes #15467 from kiszk/columnarrefactor.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/148a84b3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/148a84b3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/148a84b3

Branch: refs/heads/master
Commit: 148a84b37082697c7f61c6a621010abe4b12f2eb
Parents: 63d8390
Author: Kazuaki Ishizaki 
Authored: Thu Jan 19 15:16:05 2017 -0800
Committer: Davies Liu 
Committed: Thu Jan 19 15:16:05 2017 -0800

--
 .../spark/sql/execution/ColumnarBatchScan.scala | 133 +++
 .../sql/execution/DataSourceScanExec.scala  |  86 +---
 2 files changed, 135 insertions(+), 84 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/148a84b3/sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala
new file mode 100644
index 000..04fba17
--- /dev/null
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.catalyst.expressions.UnsafeRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
ExprCode}
+import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec
+import org.apache.spark.sql.execution.metric.SQLMetrics
+import org.apache.spark.sql.execution.vectorized.{ColumnarBatch, ColumnVector}
+import org.apache.spark.sql.types.DataType
+
+
+/**
+ * Helper trait for abstracting scan functionality using
+ * [[org.apache.spark.sql.execution.vectorized.ColumnarBatch]]es.
+ */
+private[sql] trait ColumnarBatchScan extends CodegenSupport {
+
+  val inMemoryTableScan: InMemoryTableScanExec = null
+
+  override lazy val metrics = Map(
+"numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output 
rows"),
+"scanTime" -> SQLMetrics.createTimingMetric(sparkContext, "scan time"))
+
+  /**
+   * Generate [[ColumnVector]] expressions for our parent to consume as rows.
+   * This is called once per [[ColumnarBatch]].
+   */
+  private def genCodeColumnVector(
+  ctx: CodegenContext,
+  columnVar: String,
+  ordinal: String,
+  dataType: DataType,
+  nullable: Boolean): ExprCode = {
+val javaType = ctx.javaType(dataType)
+val value = ctx.getValue(columnVar, dataType, ordinal)
+val isNullVar = if (nullable) { ctx.freshName("isNull") } else { "false" }
+val valueVar = ctx.freshName("value")
+val str = s"columnVector[$columnVar, $ordinal, ${dataType.simpleString}]"
+val code = s"${ctx.registerComment(str)}\n" + (if (nullable) {
+  s"""
+boolean $isNullVar = $columnVar.isNullAt($ordinal);
+$javaType $valueVar = $isNullVar ? ${ctx.defaultValue(dataType)} : 
($value);
+  """
+} else {
+  s"$javaType $valueVar = $value;"
+}).trim
+ExprCode(code, isNullVar, valueV

spark git commit: [SPARK-18899][SPARK-18912][SPARK-18913][SQL] refactor the error checking when append data to an existing table

2017-01-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 4cff0b504 -> 7bc3e9ba7


[SPARK-18899][SPARK-18912][SPARK-18913][SQL] refactor the error checking when 
append data to an existing table

## What changes were proposed in this pull request?

When we append data to an existing table with `DataFrameWriter.saveAsTable`, we 
will do various checks to make sure the appended data is consistent with the 
existing data.

However, we get the information of the existing table by matching the table 
relation, instead of looking at the table metadata. This is error-prone, e.g. 
we only check the number of columns for `HadoopFsRelation`, we forget to check 
bucketing, etc.

This PR refactors the error checking by looking at the metadata of the existing 
table, and fix several bugs:
* SPARK-18899: We forget to check if the specified bucketing matched the 
existing table, which may lead to a problematic table that has different 
bucketing in different data files.
* SPARK-18912: We forget to check the number of columns for non-file-based data 
source table
* SPARK-18913: We don't support append data to a table with special column 
names.

## How was this patch tested?
new regression test.

Author: Wenchen Fan 

Closes #16313 from cloud-fan/bug1.

(cherry picked from commit f923c849e5b8f7e7aeafee59db598a9bf4970f50)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7bc3e9ba
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7bc3e9ba
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7bc3e9ba

Branch: refs/heads/branch-2.1
Commit: 7bc3e9ba73869c0c6cb8e754e41dbdd4740cfd07
Parents: 4cff0b5
Author: Wenchen Fan 
Authored: Mon Dec 19 20:03:33 2016 -0800
Committer: Wenchen Fan 
Committed: Fri Jan 20 09:52:20 2017 +0800

--
 .../catalyst/catalog/ExternalCatalogUtils.scala |  37 +++
 .../spark/sql/catalyst/catalog/interface.scala  |  10 ++
 .../command/createDataSourceTables.scala| 110 ---
 .../spark/sql/execution/datasources/rules.scala |  53 +++--
 .../spark/sql/execution/command/DDLSuite.scala  |   4 +-
 .../sql/test/DataFrameReaderWriterSuite.scala   |  38 ++-
 .../sql/hive/MetastoreDataSourcesSuite.scala|  17 ++-
 .../sql/sources/HadoopFsRelationTest.scala  |   2 +-
 8 files changed, 180 insertions(+), 91 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7bc3e9ba/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
index 817c1ab..4331841 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
@@ -20,6 +20,8 @@ package org.apache.spark.sql.catalyst.catalog
 import org.apache.hadoop.fs.Path
 import org.apache.hadoop.util.Shell
 
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.analysis.Resolver
 import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
 
 object ExternalCatalogUtils {
@@ -133,4 +135,39 @@ object CatalogUtils {
   case o => o
 }
   }
+
+  def normalizePartCols(
+  tableName: String,
+  tableCols: Seq[String],
+  partCols: Seq[String],
+  resolver: Resolver): Seq[String] = {
+partCols.map(normalizeColumnName(tableName, tableCols, _, "partition", 
resolver))
+  }
+
+  def normalizeBucketSpec(
+  tableName: String,
+  tableCols: Seq[String],
+  bucketSpec: BucketSpec,
+  resolver: Resolver): BucketSpec = {
+val BucketSpec(numBuckets, bucketColumnNames, sortColumnNames) = bucketSpec
+val normalizedBucketCols = bucketColumnNames.map { colName =>
+  normalizeColumnName(tableName, tableCols, colName, "bucket", resolver)
+}
+val normalizedSortCols = sortColumnNames.map { colName =>
+  normalizeColumnName(tableName, tableCols, colName, "sort", resolver)
+}
+BucketSpec(numBuckets, normalizedBucketCols, normalizedSortCols)
+  }
+
+  private def normalizeColumnName(
+  tableName: String,
+  tableCols: Seq[String],
+  colName: String,
+  colType: String,
+  resolver: Resolver): String = {
+tableCols.find(resolver(_, colName)).getOrElse {
+  throw new AnalysisException(s"$colType column $colName is not defined in 
table $tableName, " +
+s"defined table columns are: ${tableCols.mkString(", ")}")
+}
+  }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/7bc3e9ba/sql/catalyst/src/main/scala/o

spark git commit: [SPARK-19292][SQL] filter with partition columns should be case-insensitive on Hive tables

2017-01-19 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/master 148a84b37 -> 0bf605c2c


[SPARK-19292][SQL] filter with partition columns should be case-insensitive on 
Hive tables

## What changes were proposed in this pull request?

When we query a table with a filter on partitioned columns, we will push the 
partition filter to the metastore to get matched partitions directly.

In `HiveExternalCatalog.listPartitionsByFilter`, we assume the column names in 
partition filter are already normalized and we don't need to consider case 
sensitivity. However, `HiveTableScanExec` doesn't follow this assumption. This 
PR fixes it.

## How was this patch tested?

new regression test

Author: Wenchen Fan 

Closes #16647 from cloud-fan/bug.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0bf605c2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0bf605c2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0bf605c2

Branch: refs/heads/master
Commit: 0bf605c2c67ca361cd4aa3a3b4492bef4aef76b9
Parents: 148a84b
Author: Wenchen Fan 
Authored: Thu Jan 19 20:09:48 2017 -0800
Committer: gatorsmile 
Committed: Thu Jan 19 20:09:48 2017 -0800

--
 .../sql/execution/datasources/FileSourceStrategy.scala |  2 +-
 .../spark/sql/hive/execution/HiveTableScanExec.scala   | 12 +++-
 .../spark/sql/hive/execution/SQLQuerySuite.scala   | 13 +
 3 files changed, 25 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0bf605c2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
index 6d0671d..26e1380 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
@@ -62,7 +62,7 @@ object FileSourceStrategy extends Strategy with Logging {
   val filterSet = ExpressionSet(filters)
 
   // The attribute name of predicate could be different than the one in 
schema in case of
-  // case insensitive, we should change them to match the one in schema, 
so we donot need to
+  // case insensitive, we should change them to match the one in schema, 
so we do not need to
   // worry about case sensitivity anymore.
   val normalizedFilters = filters.map { e =>
 e transform {

http://git-wip-us.apache.org/repos/asf/spark/blob/0bf605c2/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
index 7ee5fc5..def6ef3 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
@@ -146,9 +146,19 @@ case class HiveTableScanExec(
 hadoopReader.makeRDDForTable(relation.hiveQlTable)
   }
 } else {
+  // The attribute name of predicate could be different than the one in 
schema in case of
+  // case insensitive, we should change them to match the one in schema, 
so we do not need to
+  // worry about case sensitivity anymore.
+  val normalizedFilters = partitionPruningPred.map { e =>
+e transform {
+  case a: AttributeReference =>
+a.withName(relation.output.find(_.semanticEquals(a)).get.name)
+}
+  }
+
   Utils.withDummyCallSite(sqlContext.sparkContext) {
 hadoopReader.makeRDDForPartitionedTable(
-  prunePartitions(relation.getHiveQlPartitions(partitionPruningPred)))
+  prunePartitions(relation.getHiveQlPartitions(normalizedFilters)))
   }
 }
 val numOutputRows = longMetric("numOutputRows")

http://git-wip-us.apache.org/repos/asf/spark/blob/0bf605c2/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
--
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
index 104b525..1a28c4c 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
@@ -2014,4 +2014,17 @@ class SQLQuerySuit

spark git commit: [SPARK-19271][SQL] Change non-cbo estimation of aggregate

2017-01-19 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/master 0bf605c2c -> 039ed9fe8


[SPARK-19271][SQL] Change non-cbo estimation of aggregate

## What changes were proposed in this pull request?

Change non-cbo estimation behavior of aggregate:
- If groupExpression is empty, we can know row count (=1) and the corresponding 
size;
- otherwise, estimation falls back to UnaryNode's computeStats method, which 
should not propagate rowCount and attributeStats in Statistics because they are 
not estimated in that method.

## How was this patch tested?

Added test case

Author: wangzhenhua 

Closes #16631 from wzhfy/aggNoCbo.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/039ed9fe
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/039ed9fe
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/039ed9fe

Branch: refs/heads/master
Commit: 039ed9fe8a2fdcd99e0561af64cda8fe3406bc12
Parents: 0bf605c
Author: wangzhenhua 
Authored: Thu Jan 19 22:18:47 2017 -0800
Committer: gatorsmile 
Committed: Thu Jan 19 22:18:47 2017 -0800

--
 .../catalyst/plans/logical/LogicalPlan.scala|  3 ++-
 .../plans/logical/basicLogicalOperators.scala   |  7 --
 .../statsEstimation/AggregateEstimation.scala   |  2 +-
 .../statsEstimation/EstimationUtils.scala   |  4 ++--
 .../statsEstimation/ProjectEstimation.scala |  2 +-
 .../AggregateEstimationSuite.scala  | 24 +++-
 .../StatsEstimationTestBase.scala   |  7 +++---
 7 files changed, 38 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/039ed9fe/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
index 0587a59..93550e1 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
@@ -344,7 +344,8 @@ abstract class UnaryNode extends LogicalPlan {
   sizeInBytes = 1
 }
 
-child.stats(conf).copy(sizeInBytes = sizeInBytes)
+// Don't propagate rowCount and attributeStats, since they are not 
estimated here.
+Statistics(sizeInBytes = sizeInBytes, isBroadcastable = 
child.stats(conf).isBroadcastable)
   }
 }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/039ed9fe/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
index 3bd3143..432097d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
@@ -23,7 +23,7 @@ import org.apache.spark.sql.catalyst.catalog.{CatalogTable, 
CatalogTypes}
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
 import org.apache.spark.sql.catalyst.plans._
-import 
org.apache.spark.sql.catalyst.plans.logical.statsEstimation.{AggregateEstimation,
 ProjectEstimation}
+import 
org.apache.spark.sql.catalyst.plans.logical.statsEstimation.{AggregateEstimation,
 EstimationUtils, ProjectEstimation}
 import org.apache.spark.sql.types._
 import org.apache.spark.util.Utils
 
@@ -541,7 +541,10 @@ case class Aggregate(
   override def computeStats(conf: CatalystConf): Statistics = {
 def simpleEstimation: Statistics = {
   if (groupingExpressions.isEmpty) {
-super.computeStats(conf).copy(sizeInBytes = 1)
+Statistics(
+  sizeInBytes = EstimationUtils.getOutputSize(output, outputRowCount = 
1),
+  rowCount = Some(1),
+  isBroadcastable = child.stats(conf).isBroadcastable)
   } else {
 super.computeStats(conf)
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/039ed9fe/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala
i

spark git commit: [SPARK-19265][SQL] make table relation cache general and does not depend on hive

spark-website git commit: [SPARK-19249][DOC] Update download page to describe archive location

spark git commit: [SPARK-14272][ML] Add Loglikelihood in GaussianMixtureSummary

spark git commit: [SPARK-19059][SQL] Unable to retrieve data from parquet table whose name startswith underscore

[4/4] spark git commit: [SPARK-16654][CORE] Add UI coverage for Application Level Blacklisting

[2/4] spark git commit: [SPARK-16654][CORE] Add UI coverage for Application Level Blacklisting

[1/4] spark git commit: [SPARK-16654][CORE] Add UI coverage for Application Level Blacklisting

[3/4] spark git commit: [SPARK-16654][CORE] Add UI coverage for Application Level Blacklisting

spark git commit: [SPARK-19295][SQL] IsolatedClientLoader's downloadVersion should log the location of downloaded metastore client jars

spark git commit: [SPARK-17912] [SQL] Refactor code generation to get data for ColumnVector/ColumnarBatch

spark git commit: [SPARK-18899][SPARK-18912][SPARK-18913][SQL] refactor the error checking when append data to an existing table

spark git commit: [SPARK-19292][SQL] filter with partition columns should be case-insensitive on Hive tables

spark git commit: [SPARK-19271][SQL] Change non-cbo estimation of aggregate

13 matches

Site Navigation

Mail list logo

Footer information