date:20170913

spark git commit: [SPARK-21854] Added LogisticRegressionTrainingSummary for MultinomialLogisticRegression in Python API

2017-09-13 Thread yliang

Repository: spark
Updated Branches:
  refs/heads/master dcbb22943 -> 8d8641f12


[SPARK-21854] Added LogisticRegressionTrainingSummary for 
MultinomialLogisticRegression in Python API

## What changes were proposed in this pull request?

Added LogisticRegressionTrainingSummary for MultinomialLogisticRegression in 
Python API

## How was this patch tested?

Added unit test

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Author: Ming Jiang 
Author: Ming Jiang 
Author: jmwdpk 

Closes #19185 from jmwdpk/SPARK-21854.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8d8641f1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8d8641f1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8d8641f1

Branch: refs/heads/master
Commit: 8d8641f12250b0a9d370ff9354407c27af7cfcf4
Parents: dcbb229
Author: Ming Jiang 
Authored: Thu Sep 14 13:53:28 2017 +0800
Committer: Yanbo Liang 
Committed: Thu Sep 14 13:53:28 2017 +0800

--
 .../LogisticRegressionSuite.scala   |  12 ++
 python/pyspark/ml/classification.py | 120 ++-
 python/pyspark/ml/tests.py  |  55 -
 3 files changed, 183 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8d8641f1/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
--
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
index d43c7cd..14f5508 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
@@ -2416,6 +2416,18 @@ class LogisticRegressionSuite
   blorSummary.recallByThreshold.collect() === 
sameBlorSummary.recallByThreshold.collect())
 assert(
   blorSummary.precisionByThreshold.collect() === 
sameBlorSummary.precisionByThreshold.collect())
+assert(blorSummary.labels === sameBlorSummary.labels)
+assert(blorSummary.truePositiveRateByLabel === 
sameBlorSummary.truePositiveRateByLabel)
+assert(blorSummary.falsePositiveRateByLabel === 
sameBlorSummary.falsePositiveRateByLabel)
+assert(blorSummary.precisionByLabel === sameBlorSummary.precisionByLabel)
+assert(blorSummary.recallByLabel === sameBlorSummary.recallByLabel)
+assert(blorSummary.fMeasureByLabel === sameBlorSummary.fMeasureByLabel)
+assert(blorSummary.accuracy === sameBlorSummary.accuracy)
+assert(blorSummary.weightedTruePositiveRate === 
sameBlorSummary.weightedTruePositiveRate)
+assert(blorSummary.weightedFalsePositiveRate === 
sameBlorSummary.weightedFalsePositiveRate)
+assert(blorSummary.weightedRecall === sameBlorSummary.weightedRecall)
+assert(blorSummary.weightedPrecision === sameBlorSummary.weightedPrecision)
+assert(blorSummary.weightedFMeasure === sameBlorSummary.weightedFMeasure)
 
 lr.setFamily("multinomial")
 val mlorModel = lr.fit(smallMultinomialDataset)

http://git-wip-us.apache.org/repos/asf/spark/blob/8d8641f1/python/pyspark/ml/classification.py
--
diff --git a/python/pyspark/ml/classification.py 
b/python/pyspark/ml/classification.py
index fbb9e7f..0caafa6 100644
--- a/python/pyspark/ml/classification.py
+++ b/python/pyspark/ml/classification.py
@@ -529,9 +529,11 @@ class LogisticRegressionModel(JavaModel, 
JavaClassificationModel, JavaMLWritable
 trained on the training set. An exception is thrown if 
`trainingSummary is None`.
 """
 if self.hasSummary:
-java_blrt_summary = self._call_java("summary")
-# Note: Once multiclass is added, update this to return correct 
summary
-return BinaryLogisticRegressionTrainingSummary(java_blrt_summary)
+java_lrt_summary = self._call_java("summary")
+if self.numClasses <= 2:
+return 
BinaryLogisticRegressionTrainingSummary(java_lrt_summary)
+else:
+return LogisticRegressionTrainingSummary(java_lrt_summary)
 else:
 raise RuntimeError("No training summary available for this %s" %
self.__class__.__name__)
@@ -587,6 +589,14 @@ class LogisticRegressionSummary(JavaWrapper):
 return self._call_java("probabilityCol")
 
 @property
+@since("2.3.0")
+def predictionCol(self):
+"""
+Field in "predictions" which gives the prediction of

spark git commit: [MINOR][SQL] Only populate type metadata for required types such as CHAR/VARCHAR.

2017-09-13 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/master 8be7e6bb3 -> dcbb22943


[MINOR][SQL] Only populate type metadata for required types such as 
CHAR/VARCHAR.

## What changes were proposed in this pull request?
When reading column descriptions from hive catalog, we currently populate the 
metadata for all types to record the raw hive type string. In terms of 
processing , we need this additional metadata information for CHAR/VARCHAR 
types or complex type containing the CHAR/VARCHAR types.

Its a minor cleanup. I haven't created a JIRA for it.

## How was this patch tested?
Test added in HiveMetastoreCatalogSuite

Author: Dilip Biswal 

Closes #19215 from dilipbiswal/column_metadata.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dcbb2294
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dcbb2294
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dcbb2294

Branch: refs/heads/master
Commit: dcbb2294338c64d1c4a668948ec8ecb11efdeeca
Parents: 8be7e6b
Author: Dilip Biswal 
Authored: Wed Sep 13 22:45:44 2017 -0700
Committer: gatorsmile 
Committed: Wed Sep 13 22:45:44 2017 -0700

--
 .../spark/sql/hive/client/HiveClientImpl.scala  |  7 +-
 .../sql/hive/HiveMetastoreCatalogSuite.scala| 70 +++-
 .../sql/hive/HiveSchemaInferenceSuite.scala |  4 +-
 .../apache/spark/sql/hive/StatisticsSuite.scala | 18 ++---
 4 files changed, 82 insertions(+), 17 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dcbb2294/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
index 69dac7b..426db6a 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
@@ -849,7 +849,12 @@ private[hive] object HiveClientImpl {
 throw new SparkException("Cannot recognize hive type string: " + 
hc.getType, e)
 }
 
-val metadata = new MetadataBuilder().putString(HIVE_TYPE_STRING, 
hc.getType).build()
+val metadata = if (hc.getType != columnType.catalogString) {
+  new MetadataBuilder().putString(HIVE_TYPE_STRING, hc.getType).build()
+} else {
+  Metadata.empty
+}
+
 val field = StructField(
   name = hc.getName,
   dataType = columnType,

http://git-wip-us.apache.org/repos/asf/spark/blob/dcbb2294/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
--
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
index 8140f88..18137e7 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
@@ -25,7 +25,7 @@ import 
org.apache.spark.sql.catalyst.plans.logical.SubqueryAlias
 import org.apache.spark.sql.hive.test.TestHiveSingleton
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.{ExamplePointUDT, SQLTestUtils}
-import org.apache.spark.sql.types.{DecimalType, IntegerType, StringType, 
StructField, StructType}
+import org.apache.spark.sql.types._
 
 class HiveMetastoreCatalogSuite extends TestHiveSingleton with SQLTestUtils {
   import spark.implicits._
@@ -67,6 +67,73 @@ class HiveMetastoreCatalogSuite extends TestHiveSingleton 
with SQLTestUtils {
   assert(aliases.size == 1)
 }
   }
+
+  test("Validate catalog metadata for supported data types")  {
+withTable("t") {
+  sql(
+"""
+  |CREATE TABLE t (
+  |c1 boolean,
+  |c2 tinyint,
+  |c3 smallint,
+  |c4 short,
+  |c5 bigint,
+  |c6 long,
+  |c7 float,
+  |c8 double,
+  |c9 date,
+  |c10 timestamp,
+  |c11 string,
+  |c12 char(10),
+  |c13 varchar(10),
+  |c14 binary,
+  |c15 decimal,
+  |c16 decimal(10),
+  |c17 decimal(10,2),
+  |c18 array,
+  |c19 array,
+  |c20 array,
+  |c21 map,
+  |c22 map,
+  |c23 struct,
+  |c24 struct
+  |)
+""".stripMargin)
+
+  val schema = hiveClient.getTable("default", "t").schema
+  val expectedSchema = new StructType()
+.add("c1",

spark git commit: [SPARK-21973][SQL] Add an new option to filter queries in TPC-DS

2017-09-13 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/master 17edfec59 -> 8be7e6bb3


[SPARK-21973][SQL] Add an new option to filter queries in TPC-DS

## What changes were proposed in this pull request?
This pr added a new option to filter TPC-DS queries to run in 
`TPCDSQueryBenchmark`.
By default, `TPCDSQueryBenchmark` runs all the TPC-DS queries.
This change could enable developers to run some of the TPC-DS queries by this 
option,
e.g., to run q2, q4, and q6 only:
```
spark-submit --class  --conf spark.sql.tpcds.queryFilter="q2,q4,q6" 
--jars 
```

## How was this patch tested?
Manually checked.

Author: Takeshi Yamamuro 

Closes #19188 from maropu/RunPartialQueriesInTPCDS.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8be7e6bb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8be7e6bb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8be7e6bb

Branch: refs/heads/master
Commit: 8be7e6bb3cc8afd07c24e7dbf0f8fbe0f491d740
Parents: 17edfec
Author: Takeshi Yamamuro 
Authored: Wed Sep 13 21:54:10 2017 -0700
Committer: gatorsmile 
Committed: Wed Sep 13 21:54:10 2017 -0700

--
 .../benchmark/TPCDSQueryBenchmark.scala | 23 +---
 .../TPCDSQueryBenchmarkArguments.scala  | 17 +--
 2 files changed, 35 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8be7e6bb/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
index 63d118c..99c6df7 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.sql.execution.benchmark
 
 import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
 import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
@@ -29,9 +30,9 @@ import org.apache.spark.util.Benchmark
 /**
  * Benchmark to measure TPCDS query performance.
  * To run this:
- *  spark-submit --class   
+ *  spark-submit --class   --data-location 

  */
-object TPCDSQueryBenchmark {
+object TPCDSQueryBenchmark extends Logging {
   val conf =
 new SparkConf()
   .setMaster("local[1]")
@@ -90,7 +91,9 @@ object TPCDSQueryBenchmark {
   benchmark.addCase(name) { i =>
 spark.sql(queryString).collect()
   }
+  logInfo(s"\n\n= TPCDS QUERY BENCHMARK OUTPUT FOR $name =\n")
   benchmark.run()
+  logInfo(s"\n\n= FINISHED $name =\n")
 }
   }
 
@@ -110,6 +113,20 @@ object TPCDSQueryBenchmark {
   "q81", "q82", "q83", "q84", "q85", "q86", "q87", "q88", "q89", "q90",
   "q91", "q92", "q93", "q94", "q95", "q96", "q97", "q98", "q99")
 
-tpcdsAll(benchmarkArgs.dataLocation, queries = tpcdsQueries)
+// If `--query-filter` defined, filters the queries that this option 
selects
+val queriesToRun = if (benchmarkArgs.queryFilter.nonEmpty) {
+  val queries = tpcdsQueries.filter { case queryName =>
+benchmarkArgs.queryFilter.contains(queryName)
+  }
+  if (queries.isEmpty) {
+throw new RuntimeException(
+  s"Empty queries to run. Bad query name filter: 
${benchmarkArgs.queryFilter}")
+  }
+  queries
+} else {
+  tpcdsQueries
+}
+
+tpcdsAll(benchmarkArgs.dataLocation, queries = queriesToRun)
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/8be7e6bb/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmarkArguments.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmarkArguments.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmarkArguments.scala
index 8edc77b..184 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmarkArguments.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmarkArguments.scala
@@ -17,21 +17,33 @@
 
 package org.apache.spark.sql.execution.benchmark
 
+import java.util.Locale
+
+
 class TPCDSQueryBenchmarkArguments(val args: Array[String]) {
   var dataLocation: String = null
+  var queryFilter: Set[String] = Set.empty
 
   parseArgs(args.toList)

[spark] Git Push Summary

2017-09-13 Thread pwendell

Repository: spark
Updated Tags:  refs/tags/v2.1.2-rc1 [created] 6f470323a

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[2/2] spark git commit: Preparing development version 2.1.3-SNAPSHOT

2017-09-13 Thread pwendell

Preparing development version 2.1.3-SNAPSHOT


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e49c997f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e49c997f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e49c997f

Branch: refs/heads/branch-2.1
Commit: e49c997fe564a7c90c5f203f96d8c3f91cb3b024
Parents: 6f47032
Author: Patrick Wendell 
Authored: Wed Sep 13 19:34:45 2017 -0700
Committer: Patrick Wendell 
Committed: Wed Sep 13 19:34:45 2017 -0700

--
 R/pkg/DESCRIPTION | 2 +-
 assembly/pom.xml  | 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/java8-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 launcher/pom.xml  | 2 +-
 mesos/pom.xml | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 yarn/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e49c997f/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 899d410..6c380b6 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.1.2
+Version: 2.1.3
 Title: R Frontend for Apache Spark
 Description: Provides an R Frontend for Apache Spark.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),

http://git-wip-us.apache.org/repos/asf/spark/blob/e49c997f/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 133f8e6..e9f915a 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.1.2
+2.1.3-SNAPSHOT
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/e49c997f/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index d2631e4..7e203e7 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.1.2
+2.1.3-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/e49c997f/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index c12d480..92dd275 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.1.2
+2.1.3-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/e49c997f/common/network-yarn/pom.xml
--
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index d22db36..abca418 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.1.2
+2.1.3-SNAPSHOT

[1/2] spark git commit: Preparing Spark release v2.1.2-rc1

2017-09-13 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 e7696ebef -> e49c997fe


Preparing Spark release v2.1.2-rc1


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6f470323
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6f470323
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6f470323

Branch: refs/heads/branch-2.1
Commit: 6f470323a0363656999dd36cb33f528afe627c12
Parents: e7696eb
Author: Patrick Wendell 
Authored: Wed Sep 13 19:34:41 2017 -0700
Committer: Patrick Wendell 
Committed: Wed Sep 13 19:34:41 2017 -0700

--
 assembly/pom.xml  | 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 2 +-
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/java8-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 launcher/pom.xml  | 2 +-
 mesos/pom.xml | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 yarn/pom.xml  | 2 +-
 38 files changed, 38 insertions(+), 38 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6f470323/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 6e092ef..133f8e6 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.1.2-SNAPSHOT
+2.1.2
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/6f470323/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 77a4b64..d2631e4 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.1.2-SNAPSHOT
+2.1.2
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/6f470323/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 1a2d85a..c12d480 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.1.2-SNAPSHOT
+2.1.2
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/6f470323/common/network-yarn/pom.xml
--
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index fb6c241..d22db36 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.1.2-SNAPSHOT
+2.1.2
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/6f470323/common/sketch/pom.xml
--
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index ff2d5c5..1dab3f6 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.1.2-SNAPSHOT
+2.1.2
 ../../pom.xml

spark git commit: [SPARK-20427][SQL] Read JDBC table use custom schema

2017-09-13 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/master 8c7e19a37 -> 17edfec59


[SPARK-20427][SQL] Read JDBC table use custom schema

## What changes were proposed in this pull request?

Auto generated Oracle schema some times not we expect:

- `number(1)` auto mapped to BooleanType, some times it's not we expect, per 
[SPARK-20921](https://issues.apache.org/jira/browse/SPARK-20921).
-  `number` auto mapped to Decimal(38,10), It can't read big data, per 
[SPARK-20427](https://issues.apache.org/jira/browse/SPARK-20427).

This PR fix this issue by custom schema as follows:
```scala
val props = new Properties()
props.put("customSchema", "ID decimal(38, 0), N1 int, N2 boolean")
val dfRead = spark.read.schema(schema).jdbc(jdbcUrl, "tableWithCustomSchema", 
props)
dfRead.show()
```
or
```sql
CREATE TEMPORARY VIEW tableWithCustomSchema
USING org.apache.spark.sql.jdbc
OPTIONS (url '$jdbcUrl', dbTable 'tableWithCustomSchema', customSchema'ID 
decimal(38, 0), N1 int, N2 boolean')
```

## How was this patch tested?

unit tests

Author: Yuming Wang 

Closes #18266 from wangyum/SPARK-20427.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/17edfec5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/17edfec5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/17edfec5

Branch: refs/heads/master
Commit: 17edfec59de8d8680f7450b4d07c147c086c105a
Parents: 8c7e19a
Author: Yuming Wang 
Authored: Wed Sep 13 16:34:17 2017 -0700
Committer: gatorsmile 
Committed: Wed Sep 13 16:34:17 2017 -0700

--
 docs/sql-programming-guide.md   |  9 +-
 examples/src/main/python/sql/datasource.py  | 10 +++
 .../examples/sql/SQLDataSourceExample.scala |  4 +
 .../spark/sql/jdbc/OracleIntegrationSuite.scala | 47 +--
 .../datasources/jdbc/JDBCOptions.scala  |  4 +
 .../execution/datasources/jdbc/JDBCRDD.scala|  2 +-
 .../datasources/jdbc/JDBCRelation.scala |  9 +-
 .../execution/datasources/jdbc/JdbcUtils.scala  | 30 ++-
 .../datasources/jdbc/JdbcUtilsSuite.scala   | 87 
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala   | 30 +++
 10 files changed, 222 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/17edfec5/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 0a8acbb..95d7040 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1328,7 +1328,14 @@ the following case-insensitive options:
 
  The database column data types to use instead of the defaults, when 
creating the table. Data type information should be specified in the same 
format as CREATE TABLE columns syntax (e.g: "name CHAR(64), comments 
VARCHAR(1024)"). The specified types should be valid spark sql data 
types. This option applies only to writing.
 
-
+  
+
+  
+customSchema
+
+ The custom schema to use for reading data from JDBC connectors. For 
example, "id DECIMAL(38, 0), name STRING"). The column names should be 
identical to the corresponding column names of JDBC table. Users can specify 
the corresponding data types of Spark SQL instead of using the defaults. This 
option applies only to reading.
+
+  
 
 
 

http://git-wip-us.apache.org/repos/asf/spark/blob/17edfec5/examples/src/main/python/sql/datasource.py
--
diff --git a/examples/src/main/python/sql/datasource.py 
b/examples/src/main/python/sql/datasource.py
index 8777cca..f86012e 100644
--- a/examples/src/main/python/sql/datasource.py
+++ b/examples/src/main/python/sql/datasource.py
@@ -177,6 +177,16 @@ def jdbc_dataset_example(spark):
 .jdbc("jdbc:postgresql:dbserver", "schema.tablename",
   properties={"user": "username", "password": "password"})
 
+# Specifying dataframe column data types on read
+jdbcDF3 = spark.read \
+.format("jdbc") \
+.option("url", "jdbc:postgresql:dbserver") \
+.option("dbtable", "schema.tablename") \
+.option("user", "username") \
+.option("password", "password") \
+.option("customSchema", "id DECIMAL(38, 0), name STRING") \
+.load()
+
 # Saving data to a JDBC source
 jdbcDF.write \
 .format("jdbc") \

http://git-wip-us.apache.org/repos/asf/spark/blob/17edfec5/examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala

spark git commit: [SPARK-4131] Merge HiveTmpFile.scala to SaveAsHiveFile.scala

2017-09-13 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/master 21c4450fb -> 8c7e19a37


[SPARK-4131] Merge HiveTmpFile.scala to SaveAsHiveFile.scala

## What changes were proposed in this pull request?

The code is already merged to master:
https://github.com/apache/spark/pull/18975

This is a following up PR to merge HiveTmpFile.scala to SaveAsHiveFile.

## How was this patch tested?

Build successfully

Author: Jane Wang 

Closes #19221 from janewangfb/merge_savehivefile_hivetmpfile.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8c7e19a3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8c7e19a3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8c7e19a3

Branch: refs/heads/master
Commit: 8c7e19a37dc5af924be8b7af0c3607d5c7a4e96c
Parents: 21c4450
Author: Jane Wang 
Authored: Wed Sep 13 15:12:36 2017 -0700
Committer: gatorsmile 
Committed: Wed Sep 13 15:12:36 2017 -0700

--
 .../spark/sql/hive/execution/HiveTmpPath.scala  | 203 ---
 .../execution/InsertIntoHiveDirCommand.scala|   2 +-
 .../hive/execution/InsertIntoHiveTable.scala|   2 +-
 .../sql/hive/execution/SaveAsHiveFile.scala | 175 
 4 files changed, 177 insertions(+), 205 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8c7e19a3/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTmpPath.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTmpPath.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTmpPath.scala
deleted file mode 100644
index 15ca1df..000
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTmpPath.scala
+++ /dev/null
@@ -1,203 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.hive.execution
-
-import java.io.{File, IOException}
-import java.net.URI
-import java.text.SimpleDateFormat
-import java.util.{Date, Locale, Random}
-
-import scala.util.control.NonFatal
-
-import org.apache.hadoop.conf.Configuration
-import org.apache.hadoop.fs.{FileSystem, Path}
-import org.apache.hadoop.hive.common.FileUtils
-import org.apache.hadoop.hive.ql.exec.TaskRunner
-
-import org.apache.spark.internal.Logging
-import org.apache.spark.sql.SparkSession
-import org.apache.spark.sql.hive.HiveExternalCatalog
-import org.apache.spark.sql.hive.client.HiveVersion
-
-// Base trait for getting a temporary location for writing data
-private[hive] trait HiveTmpPath extends Logging {
-
-  var createdTempDir: Option[Path] = None
-
-  def getExternalTmpPath(
-  sparkSession: SparkSession,
-  hadoopConf: Configuration,
-  path: Path): Path = {
-import org.apache.spark.sql.hive.client.hive._
-
-// Before Hive 1.1, when inserting into a table, Hive will create the 
staging directory under
-// a common scratch directory. After the writing is finished, Hive will 
simply empty the table
-// directory and move the staging directory to it.
-// After Hive 1.1, Hive will create the staging directory under the table 
directory, and when
-// moving staging directory to table directory, Hive will still empty the 
table directory, but
-// will exclude the staging directory there.
-// We have to follow the Hive behavior here, to avoid troubles. For 
example, if we create
-// staging directory under the table director for Hive prior to 1.1, the 
staging directory will
-// be removed by Hive when Hive is trying to empty the table directory.
-val hiveVersionsUsingOldExternalTempPath: Set[HiveVersion] = Set(v12, v13, 
v14, v1_0)
-val hiveVersionsUsingNewExternalTempPath: Set[HiveVersion] = Set(v1_1, 
v1_2, v2_0, v2_1)
-
-// Ensure all the supported versions are considered here.
-assert(hiveVersionsUsingNewExternalTempPath ++ 
hiveVersionsUsingOldExternalTempPath ==
-  allSupportedHiveVersions)
-
-val externalCatalog =

spark git commit: [SPARK-21980][SQL] References in grouping functions should be indexed with semanticEquals

2017-09-13 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 b606dc177 -> 3a692e355


[SPARK-21980][SQL] References in grouping functions should be indexed with 
semanticEquals

## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-21980

This PR fixes the issue in ResolveGroupingAnalytics rule, which indexes the 
column references in grouping functions without considering case sensitive 
configurations.

The problem can be reproduced by:

`val df = spark.createDataFrame(Seq((1, 1), (2, 1), (2, 2))).toDF("a", "b")
 df.cube("a").agg(grouping("A")).show()`

## How was this patch tested?
unit tests

Author: donnyzone 

Closes #19202 from DonnyZone/ResolveGroupingAnalytics.

(cherry picked from commit 21c4450fb24635fab6481a3756fefa9c6f6d6235)
Signed-off-by: gatorsmile 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3a692e35
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3a692e35
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3a692e35

Branch: refs/heads/branch-2.2
Commit: 3a692e355a786260c4a9c2ef210fe14e409af37a
Parents: b606dc1
Author: donnyzone 
Authored: Wed Sep 13 10:06:53 2017 -0700
Committer: gatorsmile 
Committed: Wed Sep 13 10:10:59 2017 -0700

--
 .../spark/sql/catalyst/analysis/Analyzer.scala  |  2 +-
 .../apache/spark/sql/DataFrameAggregateSuite.scala  | 16 
 2 files changed, 17 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3a692e35/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 50c82f5..c970c20 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -315,7 +315,7 @@ class Analyzer(
 s"grouping columns (${groupByExprs.mkString(",")})")
   }
 case e @ Grouping(col: Expression) =>
-  val idx = groupByExprs.indexOf(col)
+  val idx = groupByExprs.indexWhere(_.semanticEquals(col))
   if (idx >= 0) {
 Alias(Cast(BitwiseAnd(ShiftRight(gid, Literal(groupByExprs.length 
- 1 - idx)),
   Literal(1)), ByteType), toPrettySQL(e))()

http://git-wip-us.apache.org/repos/asf/spark/blob/3a692e35/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
index 5f65512..f50c0cf 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
@@ -186,6 +186,22 @@ class DataFrameAggregateSuite extends QueryTest with 
SharedSQLContext {
 )
   }
 
+  test("SPARK-21980: References in grouping functions should be indexed with 
semanticEquals") {
+checkAnswer(
+  courseSales.cube("course", "year")
+.agg(grouping("CouRse"), grouping("year")),
+  Row("Java", 2012, 0, 0) ::
+Row("Java", 2013, 0, 0) ::
+Row("Java", null, 0, 1) ::
+Row("dotNET", 2012, 0, 0) ::
+Row("dotNET", 2013, 0, 0) ::
+Row("dotNET", null, 0, 1) ::
+Row(null, 2012, 1, 0) ::
+Row(null, 2013, 1, 0) ::
+Row(null, null, 1, 1) :: Nil
+)
+  }
+
   test("rollup overlapping columns") {
 checkAnswer(
   testData2.rollup($"a" + $"b" as "foo", $"b" as "bar").agg(sum($"a" - 
$"b") as "foo"),


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-21980][SQL] References in grouping functions should be indexed with semanticEquals

2017-09-13 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/master b6ef1f57b -> 21c4450fb


[SPARK-21980][SQL] References in grouping functions should be indexed with 
semanticEquals

## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-21980

This PR fixes the issue in ResolveGroupingAnalytics rule, which indexes the 
column references in grouping functions without considering case sensitive 
configurations.

The problem can be reproduced by:

`val df = spark.createDataFrame(Seq((1, 1), (2, 1), (2, 2))).toDF("a", "b")
 df.cube("a").agg(grouping("A")).show()`

## How was this patch tested?
unit tests

Author: donnyzone 

Closes #19202 from DonnyZone/ResolveGroupingAnalytics.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/21c4450f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/21c4450f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/21c4450f

Branch: refs/heads/master
Commit: 21c4450fb24635fab6481a3756fefa9c6f6d6235
Parents: b6ef1f5
Author: donnyzone 
Authored: Wed Sep 13 10:06:53 2017 -0700
Committer: gatorsmile 
Committed: Wed Sep 13 10:06:53 2017 -0700

--
 .../spark/sql/catalyst/analysis/Analyzer.scala  |  2 +-
 .../apache/spark/sql/DataFrameAggregateSuite.scala  | 16 
 2 files changed, 17 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/21c4450f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 1e934d0..0880bd6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -314,7 +314,7 @@ class Analyzer(
 s"grouping columns (${groupByExprs.mkString(",")})")
   }
 case e @ Grouping(col: Expression) =>
-  val idx = groupByExprs.indexOf(col)
+  val idx = groupByExprs.indexWhere(_.semanticEquals(col))
   if (idx >= 0) {
 Alias(Cast(BitwiseAnd(ShiftRight(gid, Literal(groupByExprs.length 
- 1 - idx)),
   Literal(1)), ByteType), toPrettySQL(e))()

http://git-wip-us.apache.org/repos/asf/spark/blob/21c4450f/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
index affe971..8549eac 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
@@ -190,6 +190,22 @@ class DataFrameAggregateSuite extends QueryTest with 
SharedSQLContext {
 )
   }
 
+  test("SPARK-21980: References in grouping functions should be indexed with 
semanticEquals") {
+checkAnswer(
+  courseSales.cube("course", "year")
+.agg(grouping("CouRse"), grouping("year")),
+  Row("Java", 2012, 0, 0) ::
+Row("Java", 2013, 0, 0) ::
+Row("Java", null, 0, 1) ::
+Row("dotNET", 2012, 0, 0) ::
+Row("dotNET", 2013, 0, 0) ::
+Row("dotNET", null, 0, 1) ::
+Row(null, 2012, 1, 0) ::
+Row(null, 2013, 1, 0) ::
+Row(null, null, 1, 1) :: Nil
+)
+  }
+
   test("rollup overlapping columns") {
 checkAnswer(
   testData2.rollup($"a" + $"b" as "foo", $"b" as "bar").agg(sum($"a" - 
$"b") as "foo"),


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-21970][CORE] Fix Redundant Throws Declarations in Java Codebase

2017-09-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 0fa5b7cac -> b6ef1f57b


[SPARK-21970][CORE] Fix Redundant Throws Declarations in Java Codebase

## What changes were proposed in this pull request?

1. Removing all redundant throws declarations from Java codebase.
2. Removing dead code made visible by this from 
`ShuffleExternalSorter#closeAndGetSpills`

## How was this patch tested?

Build still passes.

Author: Armin 

Closes #19182 from original-brownbear/SPARK-21970.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b6ef1f57
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b6ef1f57
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b6ef1f57

Branch: refs/heads/master
Commit: b6ef1f57bc06a0b213b0367229a09b5094267d80
Parents: 0fa5b7c
Author: Armin 
Authored: Wed Sep 13 14:04:26 2017 +0100
Committer: Sean Owen 
Committed: Wed Sep 13 14:04:26 2017 +0100

--
 .../spark/util/kvstore/LevelDBTypeInfo.java |  8 +++
 .../apache/spark/network/crypto/AuthEngine.java |  4 ++--
 .../spark/network/sasl/SparkSaslClient.java |  3 +--
 .../spark/network/sasl/SparkSaslServer.java |  3 +--
 .../spark/network/server/TransportServer.java   |  2 +-
 .../network/util/TransportFrameDecoder.java |  2 +-
 .../spark/util/sketch/CountMinSketchImpl.java   |  2 +-
 .../org/apache/spark/memory/MemoryConsumer.java |  2 --
 .../shuffle/sort/ShuffleExternalSorter.java | 23 
 .../spark/shuffle/sort/UnsafeShuffleWriter.java |  2 +-
 .../unsafe/sort/UnsafeExternalSorter.java   |  2 +-
 .../unsafe/sort/UnsafeSorterSpillWriter.java|  4 ++--
 .../streaming/JavaStructuredSessionization.java |  7 +++---
 .../apache/spark/launcher/SparkLauncher.java|  2 +-
 .../parquet/VectorizedColumnReader.java | 17 +++
 15 files changed, 36 insertions(+), 47 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b6ef1f57/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBTypeInfo.java
--
diff --git 
a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBTypeInfo.java
 
b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBTypeInfo.java
index 93aa0bb..232ee41 100644
--- 
a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBTypeInfo.java
+++ 
b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBTypeInfo.java
@@ -249,7 +249,7 @@ class LevelDBTypeInfo {
  * calculated only once, avoiding redundant work when multiple child 
indices of the
  * same parent index exist.
  */
-byte[] childPrefix(Object value) throws Exception {
+byte[] childPrefix(Object value) {
   Preconditions.checkState(parent == null, "Not a parent index.");
   return buildKey(name, toParentKey(value));
 }
@@ -295,7 +295,7 @@ class LevelDBTypeInfo {
 }
 
 /** The key for the end marker for entries with the given value. */
-byte[] end(byte[] prefix, Object value) throws Exception {
+byte[] end(byte[] prefix, Object value) {
   checkParent(prefix);
   return (parent != null) ? buildKey(false, prefix, name, toKey(value), 
END_MARKER)
 : buildKey(name, toKey(value), END_MARKER);
@@ -313,7 +313,7 @@ class LevelDBTypeInfo {
   return entityKey;
 }
 
-private void updateCount(WriteBatch batch, byte[] key, long delta) throws 
Exception {
+private void updateCount(WriteBatch batch, byte[] key, long delta) {
   long updated = getCount(key) + delta;
   if (updated > 0) {
 batch.put(key, db.serializer.serialize(updated));
@@ -431,7 +431,7 @@ class LevelDBTypeInfo {
   addOrRemove(batch, entity, null, null, naturalKey, prefix);
 }
 
-long getCount(byte[] key) throws Exception {
+long getCount(byte[] key) {
   byte[] data = db.db().get(key);
   return data != null ? db.serializer.deserializeLong(data) : 0;
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/b6ef1f57/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthEngine.java
--
diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthEngine.java
 
b/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthEngine.java
index b769ebe..056505e 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthEngine.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthEngine.java
@@ -81,7 +81,7 @@ class AuthEngine implements Closeable {
*
* @return A challenge to be sent the remote side.
*/
-  ClientChallenge challenge() throws

spark git commit: [SPARK-21690][ML] one-pass imputer

2017-09-13 Thread yliang

Repository: spark
Updated Branches:
  refs/heads/master ca00cc70d -> 0fa5b7cac


[SPARK-21690][ML] one-pass imputer

## What changes were proposed in this pull request?
parallelize the computation of all columns

performance tests:

|numColums| Mean(Old) | Median(Old) | Mean(RDD) | Median(RDD) | Mean(DF) | 
Median(DF) |
|--|--||--||--||
|1|0.0771394713|0.0658712813|0.080779802|0.04816598149996|0.1052550987001|0.0499620203|
|10|0.723434063099|0.5954440414|0.0867935197|0.1326342865998|0.0925572488999|0.1573943635|
|100|7.3756451568|6.2196631259|0.1911931552|0.862537681701|0.5557462431|1.721683798202|

## How was this patch tested?
existing tests

Author: Zheng RuiFeng 

Closes #18902 from zhengruifeng/parallelize_imputer.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0fa5b7ca
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0fa5b7ca
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0fa5b7ca

Branch: refs/heads/master
Commit: 0fa5b7cacca4e867dd9f787cc2801616967932a4
Parents: ca00cc7
Author: Zheng RuiFeng 
Authored: Wed Sep 13 20:12:21 2017 +0800
Committer: Yanbo Liang 
Committed: Wed Sep 13 20:12:21 2017 +0800

--
 .../org/apache/spark/ml/feature/Imputer.scala   | 56 ++--
 1 file changed, 41 insertions(+), 15 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0fa5b7ca/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala
index 9e023b9..1f36ece 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala
@@ -133,23 +133,49 @@ class Imputer @Since("2.2.0") (@Since("2.2.0") override 
val uid: String)
   override def fit(dataset: Dataset[_]): ImputerModel = {
 transformSchema(dataset.schema, logging = true)
 val spark = dataset.sparkSession
-import spark.implicits._
-val surrogates = $(inputCols).map { inputCol =>
-  val ic = col(inputCol)
-  val filtered = dataset.select(ic.cast(DoubleType))
-.filter(ic.isNotNull && ic =!= $(missingValue) && !ic.isNaN)
-  if(filtered.take(1).length == 0) {
-throw new SparkException(s"surrogate cannot be computed. " +
-  s"All the values in $inputCol are Null, Nan or 
missingValue(${$(missingValue)})")
-  }
-  val surrogate = $(strategy) match {
-case Imputer.mean => filtered.select(avg(inputCol)).as[Double].first()
-case Imputer.median => filtered.stat.approxQuantile(inputCol, 
Array(0.5), 0.001).head
-  }
-  surrogate
+
+val cols = $(inputCols).map { inputCol =>
+  when(col(inputCol).equalTo($(missingValue)), null)
+.when(col(inputCol).isNaN, null)
+.otherwise(col(inputCol))
+.cast("double")
+.as(inputCol)
+}
+
+val results = $(strategy) match {
+  case Imputer.mean =>
+// Function avg will ignore null automatically.
+// For a column only containing null, avg will return null.
+val row = dataset.select(cols.map(avg): _*).head()
+Array.range(0, $(inputCols).length).map { i =>
+  if (row.isNullAt(i)) {
+Double.NaN
+  } else {
+row.getDouble(i)
+  }
+}
+
+  case Imputer.median =>
+// Function approxQuantile will ignore null automatically.
+// For a column only containing null, approxQuantile will return an 
empty array.
+dataset.select(cols: _*).stat.approxQuantile($(inputCols), Array(0.5), 
0.001)
+  .map { array =>
+if (array.isEmpty) {
+  Double.NaN
+} else {
+  array.head
+}
+  }
+}
+
+val emptyCols = $(inputCols).zip(results).filter(_._2.isNaN).map(_._1)
+if (emptyCols.nonEmpty) {
+  throw new SparkException(s"surrogate cannot be computed. " +
+s"All the values in ${emptyCols.mkString(",")} are Null, Nan or " +
+s"missingValue(${$(missingValue)})")
 }
 
-val rows = spark.sparkContext.parallelize(Seq(Row.fromSeq(surrogates)))
+val rows = spark.sparkContext.parallelize(Seq(Row.fromSeq(results)))
 val schema = StructType($(inputCols).map(col => StructField(col, 
DoubleType, nullable = false)))
 val surrogateDF = spark.createDataFrame(rows, schema)
 copyValues(new ImputerModel(uid, surrogateDF).setParent(this))


-
To unsubscribe,

spark git commit: [SPARK-21963][CORE][TEST] Create temp file should be delete after use

2017-09-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 4fbf748bf -> ca00cc70d


[SPARK-21963][CORE][TEST] Create temp file should be delete after use

## What changes were proposed in this pull request?

After you create a temporary table, you need to delete it, otherwise it will 
leave a file similar to the file name âSPARK194465907929586320484966tempâ.

## How was this patch tested?

N / A

Author: caoxuewen 

Closes #19174 from heary-cao/DeleteTempFile.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ca00cc70
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ca00cc70
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ca00cc70

Branch: refs/heads/master
Commit: ca00cc70d6f01c0253a5bc2c22089cc54b476462
Parents: 4fbf748
Author: caoxuewen 
Authored: Wed Sep 13 13:01:30 2017 +0100
Committer: Sean Owen 
Committed: Wed Sep 13 13:01:30 2017 +0100

--
 .../scala/org/apache/spark/SparkContextSuite.scala |  1 +
 .../apache/spark/security/CryptoStreamUtilsSuite.scala |  1 +
 .../test/scala/org/apache/spark/util/UtilsSuite.scala  |  1 +
 .../apache/spark/launcher/ChildProcAppHandleSuite.java | 13 +++--
 4 files changed, 14 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ca00cc70/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
index 890e93d..0ed5f26 100644
--- a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
@@ -600,6 +600,7 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 val fs = new DebugFilesystem()
 fs.initialize(new URI("file:///"), new Configuration())
 val file = File.createTempFile("SPARK19446", "temp")
+file.deleteOnExit()
 Files.write(Array.ofDim[Byte](1000), file)
 val path = new Path("file:///" + file.getCanonicalPath)
 val stream = fs.open(path)

http://git-wip-us.apache.org/repos/asf/spark/blob/ca00cc70/core/src/test/scala/org/apache/spark/security/CryptoStreamUtilsSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/security/CryptoStreamUtilsSuite.scala 
b/core/src/test/scala/org/apache/spark/security/CryptoStreamUtilsSuite.scala
index 608052f..78f618f 100644
--- a/core/src/test/scala/org/apache/spark/security/CryptoStreamUtilsSuite.scala
+++ b/core/src/test/scala/org/apache/spark/security/CryptoStreamUtilsSuite.scala
@@ -130,6 +130,7 @@ class CryptoStreamUtilsSuite extends SparkFunSuite {
 val conf = createConf()
 val key = createKey(conf)
 val file = Files.createTempFile("crypto", ".test").toFile()
+file.deleteOnExit()
 
 val outStream = createCryptoOutputStream(new FileOutputStream(file), conf, 
key)
 try {

http://git-wip-us.apache.org/repos/asf/spark/blob/ca00cc70/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala 
b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
index 4ce143f..05d58d8 100644
--- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
@@ -939,6 +939,7 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties with Logging {
 // creating a very misbehaving process. It ignores SIGTERM and has 
been SIGSTOPed. On
 // older versions of java, this will *not* terminate.
 val file = File.createTempFile("temp-file-name", ".tmp")
+file.deleteOnExit()
 val cmd =
   s"""
  |#!/bin/bash

http://git-wip-us.apache.org/repos/asf/spark/blob/ca00cc70/launcher/src/test/java/org/apache/spark/launcher/ChildProcAppHandleSuite.java
--
diff --git 
a/launcher/src/test/java/org/apache/spark/launcher/ChildProcAppHandleSuite.java 
b/launcher/src/test/java/org/apache/spark/launcher/ChildProcAppHandleSuite.java
index 3b4d1b0..9f59b41 100644
--- 
a/launcher/src/test/java/org/apache/spark/launcher/ChildProcAppHandleSuite.java
+++ 
b/launcher/src/test/java/org/apache/spark/launcher/ChildProcAppHandleSuite.java
@@ -114,6 +114,7 @@ public class ChildProcAppHandleSuite extends BaseSuite {
 assumeFalse(isWindows());
 
 Path err = Files.createTempFile("stderr", "txt");
+err.toFile().deleteOnExit();
 
 SparkAppHandle handle =

spark git commit: [SPARK-21893][BUILD][STREAMING][WIP] Put Kafka 0.8 behind a profile

2017-09-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master a1d98c6dc -> 4fbf748bf


[SPARK-21893][BUILD][STREAMING][WIP] Put Kafka 0.8 behind a profile

## What changes were proposed in this pull request?

Put Kafka 0.8 support behind a kafka-0-8 profile.

## How was this patch tested?

Existing tests, but, until PR builder and Jenkins configs are updated the 
effect here is to not build or test Kafka 0.8 support at all.

Author: Sean Owen 

Closes #19134 from srowen/SPARK-21893.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4fbf748b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4fbf748b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4fbf748b

Branch: refs/heads/master
Commit: 4fbf748bf85b18f32a2cd32b1b1881d24360626e
Parents: a1d98c6
Author: Sean Owen 
Authored: Wed Sep 13 10:10:40 2017 +0100
Committer: Sean Owen 
Committed: Wed Sep 13 10:10:40 2017 +0100

--
 dev/create-release/release-build.sh |  32 +++---
 dev/mima|   2 +-
 dev/scalastyle  |   1 +
 dev/sparktestsupport/modules.py |   6 ++
 dev/test-dependencies.sh|   2 +-
 docs/building-spark.md  |   9 ++
 docs/streaming-kafka-0-8-integration.md |  23 ++--
 docs/streaming-kafka-integration.md |  11 +-
 docs/streaming-programming-guide.md |   6 +-
 examples/pom.xml|   2 +-
 .../streaming/JavaDirectKafkaWordCount.java |  21 ++--
 .../examples/streaming/JavaKafkaWordCount.java  |  87 ---
 .../streaming/DirectKafkaWordCount.scala|  12 +--
 .../examples/streaming/KafkaWordCount.scala | 105 ---
 .../apache/spark/streaming/kafka/Broker.scala   |   2 +
 .../spark/streaming/kafka/KafkaCluster.scala|   2 +
 .../spark/streaming/kafka/KafkaUtils.scala  |   1 +
 .../spark/streaming/kafka/OffsetRange.scala |   3 +
 pom.xml |  10 +-
 project/SparkBuild.scala|   8 +-
 python/pyspark/streaming/kafka.py   |  26 -
 python/pyspark/streaming/tests.py   |  14 ++-
 22 files changed, 127 insertions(+), 258 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4fbf748b/dev/create-release/release-build.sh
--
diff --git a/dev/create-release/release-build.sh 
b/dev/create-release/release-build.sh
index ee2407a..f4a7f25 100755
--- a/dev/create-release/release-build.sh
+++ b/dev/create-release/release-build.sh
@@ -80,8 +80,17 @@ NEXUS_PROFILE=d63f592e7eac0 # Profile for Spark staging 
uploads
 BASE_DIR=$(pwd)
 
 MVN="build/mvn --force"
-PUBLISH_PROFILES="-Pmesos -Pyarn -Phive -Phive-thriftserver"
-PUBLISH_PROFILES="$PUBLISH_PROFILES -Pspark-ganglia-lgpl -Pkinesis-asl"
+
+# Hive-specific profiles for some builds
+HIVE_PROFILES="-Phive -Phive-thriftserver"
+# Profiles for publishing snapshots and release to Maven Central
+PUBLISH_PROFILES="-Pmesos -Pyarn $HIVE_PROFILES -Pspark-ganglia-lgpl 
-Pkinesis-asl"
+# Profiles for building binary releases
+BASE_RELEASE_PROFILES="-Pmesos -Pyarn -Psparkr"
+# Scala 2.11 only profiles for some builds
+SCALA_2_11_PROFILES="-Pkafka-0-8"
+# Scala 2.12 only profiles for some builds
+SCALA_2_12_PROFILES="-Pscala-2.12"
 
 rm -rf spark
 git clone https://git-wip-us.apache.org/repos/asf/spark.git
@@ -235,10 +244,9 @@ if [[ "$1" == "package" ]]; then
 
   # We increment the Zinc port each time to avoid OOM's and other craziness if 
multiple builds
   # share the same Zinc server.
-  FLAGS="-Psparkr -Phive -Phive-thriftserver -Pyarn -Pmesos"
-  make_binary_release "hadoop2.6" "-Phadoop-2.6 $FLAGS" "3035" "withr" &
-  make_binary_release "hadoop2.7" "-Phadoop-2.7 $FLAGS" "3036" "withpip" &
-  make_binary_release "without-hadoop" "-Psparkr -Phadoop-provided -Pyarn 
-Pmesos" "3038" &
+  make_binary_release "hadoop2.6" "-Phadoop-2.6 $HIVE_PROFILES 
$SCALA_2_11_PROFILES $BASE_RELEASE_PROFILES" "3035" "withr" &
+  make_binary_release "hadoop2.7" "-Phadoop-2.7 $HIVE_PROFILES 
$SCALA_2_11_PROFILES $BASE_RELEASE_PROFILES" "3036" "withpip" &
+  make_binary_release "without-hadoop" "-Phadoop-provided $SCALA_2_11_PROFILES 
$BASE_RELEASE_PROFILES" "3038" &
   wait
   rm -rf spark-$SPARK_VERSION-bin-*/
 
@@ -304,10 +312,10 @@ if [[ "$1" == "publish-snapshot" ]]; then
   # Generate random point for Zinc
   export ZINC_PORT=$(python -S -c "import random; print 
random.randrange(3030,4030)")
 
-  $MVN -DzincPort=$ZINC_PORT --settings $tmp_settings -DskipTests 
$PUBLISH_PROFILES deploy
+  $MVN -DzincPort=$ZINC_PORT --settings $tmp_settings -DskipTests

spark git commit: [SPARK-21982] Set locale to US

2017-09-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master dd88fa3d3 -> a1d98c6dc


[SPARK-21982] Set locale to US

## What changes were proposed in this pull request?

In UtilsSuite Locale was set by default to US, but at the moment of using 
format function it wasn't, taking by default JVM locale which could be 
different than US making this test fail.

## How was this patch tested?
Unit test (UtilsSuite)

Author: German Schiavon 

Closes #19205 from Gschiavon/fix/test-locale.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a1d98c6d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a1d98c6d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a1d98c6d

Branch: refs/heads/master
Commit: a1d98c6dcdf387121139d1566f5b1924e2a02a75
Parents: dd88fa3
Author: German Schiavon 
Authored: Wed Sep 13 09:52:45 2017 +0100
Committer: Sean Owen 
Committed: Wed Sep 13 09:52:45 2017 +0100

--
 core/src/main/scala/org/apache/spark/util/Utils.scala | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a1d98c6d/core/src/main/scala/org/apache/spark/util/Utils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index 1e8250f..bc08808 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -1193,16 +1193,17 @@ private[spark] object Utils extends Logging {
 val second = 1000
 val minute = 60 * second
 val hour = 60 * minute
+val locale = Locale.US
 
 ms match {
   case t if t < second =>
-"%d ms".format(t)
+"%d ms".formatLocal(locale, t)
   case t if t < minute =>
-"%.1f s".format(t.toFloat / second)
+"%.1f s".formatLocal(locale, t.toFloat / second)
   case t if t < hour =>
-"%.1f m".format(t.toFloat / minute)
+"%.1f m".formatLocal(locale, t.toFloat / minute)
   case t =>
-"%.2f h".format(t.toFloat / hour)
+"%.2f h".formatLocal(locale, t.toFloat / hour)
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [BUILD] Close stale PRs

2017-09-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master f6c5d8f69 -> dd88fa3d3


[BUILD] Close stale PRs

Closes #18522
Closes #17722
Closes #18879
Closes #18891
Closes #18806
Closes #18948
Closes #18949
Closes #19070
Closes #19039
Closes #19142
Closes #18515
Closes #19154
Closes #19162
Closes #19187
Closes #19091

Author: Sean Owen 

Closes #19203 from srowen/CloseStalePRs3.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dd88fa3d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dd88fa3d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dd88fa3d

Branch: refs/heads/master
Commit: dd88fa3d3b335b40ddd5e63ddf19818890aba4a3
Parents: f6c5d8f
Author: Sean Owen 
Authored: Wed Sep 13 09:51:49 2017 +0100
Committer: Sean Owen 
Committed: Wed Sep 13 09:51:49 2017 +0100

--

--



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark-website git commit: Update Yourkit usage

2017-09-13 Thread srowen

Repository: spark-website
Updated Branches:
  refs/heads/asf-site a1f847efc -> 442b04535


Update Yourkit usage


Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/442b0453
Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/442b0453
Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/442b0453

Branch: refs/heads/asf-site
Commit: 442b04535b78fed3f778e12e05678774fd16c832
Parents: a1f847e
Author: Takeshi Yamamuro 
Authored: Mon Sep 11 10:52:05 2017 +0900
Committer: Sean Owen 
Committed: Wed Sep 13 09:50:03 2017 +0100

--
 developer-tools.md| 14 +++---
 site/developer-tools.html | 14 +++---
 2 files changed, 14 insertions(+), 14 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark-website/blob/442b0453/developer-tools.md
--
diff --git a/developer-tools.md b/developer-tools.md
index dab3e8a..c975e80 100644
--- a/developer-tools.md
+++ b/developer-tools.md
@@ -389,19 +389,19 @@ Here are instructions on profiling Spark applications 
using YourKit Java Profile
 https://www.yourkit.com/download/index.jsp;>YourKit downloads 
page. 
 This file is pretty big (~100 MB) and YourKit downloads site is somewhat slow, 
so you may 
 consider mirroring this file or including it on a custom AMI.
-- Untar this file somewhere (in `/root` in our case): `tar xvjf 
yjp-12.0.5-linux.tar.bz2`
-- Copy the expanded YourKit files to each node using copy-dir: 
`~/spark-ec2/copy-dir /root/yjp-12.0.5`
+- Unzip this file somewhere (in `/root` in our case): `unzip 
YourKit-JavaProfiler-2017.02-b66.zip`
+- Copy the expanded YourKit files to each node using copy-dir: 
`~/spark-ec2/copy-dir /root/YourKit-JavaProfiler-2017.02`
 - Configure the Spark JVMs to use the YourKit profiling agent by editing 
`~/spark/conf/spark-env.sh` 
 and adding the lines
 ```
-SPARK_DAEMON_JAVA_OPTS+=" 
-agentpath:/root/yjp-12.0.5/bin/linux-x86-64/libyjpagent.so=sampling"
+SPARK_DAEMON_JAVA_OPTS+=" 
-agentpath:/root/YourKit-JavaProfiler-2017.02/bin/linux-x86-64/libyjpagent.so=sampling"
 export SPARK_DAEMON_JAVA_OPTS
-SPARK_JAVA_OPTS+=" 
-agentpath:/root/yjp-12.0.5/bin/linux-x86-64/libyjpagent.so=sampling"
-export SPARK_JAVA_OPTS
+SPARK_EXECUTOR_OPTS+=" 
-agentpath:/root/YourKit-JavaProfiler-2017.02/bin/linux-x86-64/libyjpagent.so=sampling"
+export SPARK_EXECUTOR_OPTS
 ```
 - Copy the updated configuration to each node: `~/spark-ec2/copy-dir 
~/spark/conf/spark-env.sh`
 - Restart your Spark cluster: `~/spark/bin/stop-all.sh` and 
`~/spark/bin/start-all.sh`
-- By default, the YourKit profiler agents use ports 10001-10010. To connect 
the YourKit desktop 
+- By default, the YourKit profiler agents use ports `10001-10010`. To connect 
the YourKit desktop
 application to the remote profiler agents, you'll have to open these ports in 
the cluster's EC2 
 security groups. To do this, sign into the AWS Management Console. Go to the 
EC2 section and 
 select `Security Groups` from the `Network & Security` section on the left 
side of the page. 
@@ -417,7 +417,7 @@ cluster with the same name, your security group settings 
will be re-used.
 - YourKit should now be connected to the remote profiling agent. It may take a 
few moments for profiling information to appear.
 
 Please see the full YourKit documentation for the full list of profiler agent
-http://www.yourkit.com/docs/80/help/startup_options.jsp;>startup 
options.
+https://www.yourkit.com/docs/java/help/startup_options.jsp;>startup 
options.
  
 In Spark unit tests
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/442b0453/site/developer-tools.html
--
diff --git a/site/developer-tools.html b/site/developer-tools.html
index 9a9ec8a..d8e3585 100644
--- a/site/developer-tools.html
+++ b/site/developer-tools.html
@@ -568,19 +568,19 @@ version: 1.5.0-SNAPSHOT
 https://www.yourkit.com/download/index.jsp;>YourKit downloads 
page. 
 This file is pretty big (~100 MB) and YourKit downloads site is somewhat slow, 
so you may 
 consider mirroring this file or including it on a custom AMI.
-  Untar this file somewhere (in /root in our case): tar 
xvjf yjp-12.0.5-linux.tar.bz2
-  Copy the expanded YourKit files to each node using copy-dir: 
~/spark-ec2/copy-dir /root/yjp-12.0.5
+  Unzip this file somewhere (in /root in our case): 
unzip YourKit-JavaProfiler-2017.02-b66.zip
+  Copy the expanded YourKit files to each node using copy-dir: 
~/spark-ec2/copy-dir /root/YourKit-JavaProfiler-2017.02
   Configure the Spark JVMs to use the YourKit profiling agent by editing 
~/spark/conf/spark-env.sh 
 and adding the lines
-SPARK_DAEMON_JAVA_OPTS+="

spark git commit: [SPARK-21027][MINOR][FOLLOW-UP] add missing since tag

2017-09-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 371e4e205 -> f6c5d8f69


[SPARK-21027][MINOR][FOLLOW-UP] add missing since tag

## What changes were proposed in this pull request?

add missing since tag for `setParallelism` in #19110

## How was this patch tested?

N/A

Author: WeichenXu 

Closes #19214 from WeichenXu123/minor01.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f6c5d8f6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f6c5d8f6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f6c5d8f6

Branch: refs/heads/master
Commit: f6c5d8f6925e1868b8948a442b44c19535150e2a
Parents: 371e4e2
Author: WeichenXu 
Authored: Wed Sep 13 09:48:04 2017 +0100
Committer: Sean Owen 
Committed: Wed Sep 13 09:48:04 2017 +0100

--
 .../main/scala/org/apache/spark/ml/classification/OneVsRest.scala   | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f6c5d8f6/mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala 
b/mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala
index 942e981..92a7742 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala
@@ -303,6 +303,7 @@ final class OneVsRest @Since("1.4.0") (
*
* @group expertSetParam
*/
+  @Since("2.3.0")
   def setParallelism(value: Int): this.type = {
 set(parallelism, value)
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-21854] Added LogisticRegressionTrainingSummary for MultinomialLogisticRegression in Python API

spark git commit: [MINOR][SQL] Only populate type metadata for required types such as CHAR/VARCHAR.

spark git commit: [SPARK-21973][SQL] Add an new option to filter queries in TPC-DS

[spark] Git Push Summary

[2/2] spark git commit: Preparing development version 2.1.3-SNAPSHOT

[1/2] spark git commit: Preparing Spark release v2.1.2-rc1

spark git commit: [SPARK-20427][SQL] Read JDBC table use custom schema

spark git commit: [SPARK-4131] Merge HiveTmpFile.scala to SaveAsHiveFile.scala

spark git commit: [SPARK-21980][SQL] References in grouping functions should be indexed with semanticEquals

spark git commit: [SPARK-21980][SQL] References in grouping functions should be indexed with semanticEquals

spark git commit: [SPARK-21970][CORE] Fix Redundant Throws Declarations in Java Codebase

spark git commit: [SPARK-21690][ML] one-pass imputer

spark git commit: [SPARK-21963][CORE][TEST] Create temp file should be delete after use

spark git commit: [SPARK-21893][BUILD][STREAMING][WIP] Put Kafka 0.8 behind a profile

spark git commit: [SPARK-21982] Set locale to US

spark git commit: [BUILD] Close stale PRs

spark-website git commit: Update Yourkit usage

spark git commit: [SPARK-21027][MINOR][FOLLOW-UP] add missing since tag

18 matches

Site Navigation

Mail list logo

Footer information