spark git commit: [SPARK-16632][SQL] Respect Hive schema when merging parquet schema.

2016-07-19 Thread lian
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 6f209c8fa -> c2b5b3ca5


[SPARK-16632][SQL] Respect Hive schema when merging parquet schema.

When Hive (or at least certain versions of Hive) creates parquet files
containing tinyint or smallint columns, it stores them as int32, but
doesn't annotate the parquet field as containing the corresponding
int8 / int16 data. When Spark reads those files using the vectorized
reader, it follows the parquet schema for these fields, but when
actually reading the data it tries to use the type fetched from
the metastore, and then fails because data has been loaded into the
wrong fields in OnHeapColumnVector.

So instead of blindly trusting the parquet schema, check whether the
Catalyst-provided schema disagrees with it, and adjust the types so
that the necessary metadata is present when loading the data into
the ColumnVector instance.

Tested with unit tests and with tests that create byte / short columns
in Hive and try to read them from Spark.

Author: Marcelo Vanzin 

Closes #14272 from vanzin/SPARK-16632.

(cherry picked from commit 75146be6ba5e9f559f5f15430310bb476ee0812c)
Signed-off-by: Cheng Lian 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c2b5b3ca
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c2b5b3ca
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c2b5b3ca

Branch: refs/heads/branch-2.0
Commit: c2b5b3ca538aaaef946653e60bd68e38c58dc41f
Parents: 6f209c8
Author: Marcelo Vanzin 
Authored: Wed Jul 20 13:00:22 2016 +0800
Committer: Cheng Lian 
Committed: Wed Jul 20 13:49:45 2016 +0800

--
 .../parquet/ParquetReadSupport.scala| 18 +
 .../parquet/ParquetSchemaSuite.scala| 39 
 2 files changed, 57 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c2b5b3ca/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
index 12f4974..1628e4c 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
@@ -26,6 +26,8 @@ import org.apache.parquet.hadoop.api.{InitContext, 
ReadSupport}
 import org.apache.parquet.hadoop.api.ReadSupport.ReadContext
 import org.apache.parquet.io.api.RecordMaterializer
 import org.apache.parquet.schema._
+import org.apache.parquet.schema.OriginalType._
+import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName._
 import org.apache.parquet.schema.Type.Repetition
 
 import org.apache.spark.internal.Logging
@@ -116,6 +118,12 @@ private[parquet] object ParquetReadSupport {
   }
 
   private def clipParquetType(parquetType: Type, catalystType: DataType): Type 
= {
+val primName = if (parquetType.isPrimitive()) {
+  parquetType.asPrimitiveType().getPrimitiveTypeName()
+} else {
+  null
+}
+
 catalystType match {
   case t: ArrayType if !isPrimitiveCatalystType(t.elementType) =>
 // Only clips array types with nested type as element type.
@@ -130,6 +138,16 @@ private[parquet] object ParquetReadSupport {
   case t: StructType =>
 clipParquetGroup(parquetType.asGroupType(), t)
 
+  case _: ByteType if primName == INT32 =>
+// SPARK-16632: Handle case where Hive stores bytes in a int32 field 
without specifying
+// the original type.
+Types.primitive(INT32, 
parquetType.getRepetition()).as(INT_8).named(parquetType.getName())
+
+  case _: ShortType if primName == INT32 =>
+// SPARK-16632: Handle case where Hive stores shorts in a int32 field 
without specifying
+// the original type.
+Types.primitive(INT32, 
parquetType.getRepetition()).as(INT_16).named(parquetType.getName())
+
   case _ =>
 // UDTs and primitive types are not clipped.  For UDTs, a clipped 
version might not be able
 // to be mapped to desired user-space types.  So UDTs shouldn't 
participate schema merging.

http://git-wip-us.apache.org/repos/asf/spark/blob/c2b5b3ca/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
 

spark git commit: [SPARK-16632][SQL] Respect Hive schema when merging parquet schema.

2016-07-19 Thread lian
Repository: spark
Updated Branches:
  refs/heads/master fc2326362 -> 75146be6b


[SPARK-16632][SQL] Respect Hive schema when merging parquet schema.

When Hive (or at least certain versions of Hive) creates parquet files
containing tinyint or smallint columns, it stores them as int32, but
doesn't annotate the parquet field as containing the corresponding
int8 / int16 data. When Spark reads those files using the vectorized
reader, it follows the parquet schema for these fields, but when
actually reading the data it tries to use the type fetched from
the metastore, and then fails because data has been loaded into the
wrong fields in OnHeapColumnVector.

So instead of blindly trusting the parquet schema, check whether the
Catalyst-provided schema disagrees with it, and adjust the types so
that the necessary metadata is present when loading the data into
the ColumnVector instance.

Tested with unit tests and with tests that create byte / short columns
in Hive and try to read them from Spark.

Author: Marcelo Vanzin 

Closes #14272 from vanzin/SPARK-16632.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/75146be6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/75146be6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/75146be6

Branch: refs/heads/master
Commit: 75146be6ba5e9f559f5f15430310bb476ee0812c
Parents: fc23263
Author: Marcelo Vanzin 
Authored: Wed Jul 20 13:00:22 2016 +0800
Committer: Cheng Lian 
Committed: Wed Jul 20 13:00:22 2016 +0800

--
 .../parquet/ParquetReadSupport.scala| 18 +
 .../parquet/ParquetSchemaSuite.scala| 39 
 2 files changed, 57 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/75146be6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
index e6ef634..46d786d 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
@@ -26,6 +26,8 @@ import org.apache.parquet.hadoop.api.{InitContext, 
ReadSupport}
 import org.apache.parquet.hadoop.api.ReadSupport.ReadContext
 import org.apache.parquet.io.api.RecordMaterializer
 import org.apache.parquet.schema._
+import org.apache.parquet.schema.OriginalType._
+import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName._
 import org.apache.parquet.schema.Type.Repetition
 
 import org.apache.spark.internal.Logging
@@ -120,6 +122,12 @@ private[parquet] object ParquetReadSupport {
   }
 
   private def clipParquetType(parquetType: Type, catalystType: DataType): Type 
= {
+val primName = if (parquetType.isPrimitive()) {
+  parquetType.asPrimitiveType().getPrimitiveTypeName()
+} else {
+  null
+}
+
 catalystType match {
   case t: ArrayType if !isPrimitiveCatalystType(t.elementType) =>
 // Only clips array types with nested type as element type.
@@ -134,6 +142,16 @@ private[parquet] object ParquetReadSupport {
   case t: StructType =>
 clipParquetGroup(parquetType.asGroupType(), t)
 
+  case _: ByteType if primName == INT32 =>
+// SPARK-16632: Handle case where Hive stores bytes in a int32 field 
without specifying
+// the original type.
+Types.primitive(INT32, 
parquetType.getRepetition()).as(INT_8).named(parquetType.getName())
+
+  case _: ShortType if primName == INT32 =>
+// SPARK-16632: Handle case where Hive stores shorts in a int32 field 
without specifying
+// the original type.
+Types.primitive(INT32, 
parquetType.getRepetition()).as(INT_16).named(parquetType.getName())
+
   case _ =>
 // UDTs and primitive types are not clipped.  For UDTs, a clipped 
version might not be able
 // to be mapped to desired user-space types.  So UDTs shouldn't 
participate schema merging.

http://git-wip-us.apache.org/repos/asf/spark/blob/75146be6/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
index 8a980a7..31ebec0 100644
--- 

spark git commit: [SPARK-10683][SPARK-16510][SPARKR] Move SparkR include jar test to SparkSubmitSuite

2016-07-19 Thread shivaram
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 f58fd4620 -> 6f209c8fa


[SPARK-10683][SPARK-16510][SPARKR] Move SparkR include jar test to 
SparkSubmitSuite

## What changes were proposed in this pull request?

This change moves the include jar test from R to SparkSubmitSuite and uses a 
dynamically compiled jar. This helps us remove the binary jar from the R 
package and solves both the CRAN warnings and the lack of source being 
available for this jar.

## How was this patch tested?
SparkR unit tests, SparkSubmitSuite, check-cran.sh

Author: Shivaram Venkataraman 

Closes #14243 from shivaram/sparkr-jar-move.

(cherry picked from commit fc23263623d5dcd1167fa93c094fe41ace77c326)
Signed-off-by: Shivaram Venkataraman 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6f209c8f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6f209c8f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6f209c8f

Branch: refs/heads/branch-2.0
Commit: 6f209c8faad0c928368852c881e2aaabe100b152
Parents: f58fd46
Author: Shivaram Venkataraman 
Authored: Tue Jul 19 19:28:08 2016 -0700
Committer: Shivaram Venkataraman 
Committed: Tue Jul 19 19:28:18 2016 -0700

--
 .../inst/test_support/sparktestjar_2.10-1.0.jar | Bin 2886 -> 0 bytes
 R/pkg/inst/tests/testthat/jarTest.R |  10 ++---
 R/pkg/inst/tests/testthat/test_includeJAR.R |  36 --
 .../scala/org/apache/spark/api/r/RUtils.scala   |   9 +
 .../apache/spark/deploy/SparkSubmitSuite.scala  |  38 +++
 5 files changed, 52 insertions(+), 41 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6f209c8f/R/pkg/inst/test_support/sparktestjar_2.10-1.0.jar
--
diff --git a/R/pkg/inst/test_support/sparktestjar_2.10-1.0.jar 
b/R/pkg/inst/test_support/sparktestjar_2.10-1.0.jar
deleted file mode 100644
index 1d5c2af..000
Binary files a/R/pkg/inst/test_support/sparktestjar_2.10-1.0.jar and /dev/null 
differ

http://git-wip-us.apache.org/repos/asf/spark/blob/6f209c8f/R/pkg/inst/tests/testthat/jarTest.R
--
diff --git a/R/pkg/inst/tests/testthat/jarTest.R 
b/R/pkg/inst/tests/testthat/jarTest.R
index 51754a4..c9615c8 100644
--- a/R/pkg/inst/tests/testthat/jarTest.R
+++ b/R/pkg/inst/tests/testthat/jarTest.R
@@ -16,17 +16,17 @@
 #
 library(SparkR)
 
-sparkR.session()
+sc <- sparkR.session()
 
-helloTest <- SparkR:::callJStatic("sparkR.test.hello",
+helloTest <- SparkR:::callJStatic("sparkrtest.DummyClass",
   "helloWorld",
   "Dave")
+stopifnot(identical(helloTest, "Hello Dave"))
 
-basicFunction <- SparkR:::callJStatic("sparkR.test.basicFunction",
+basicFunction <- SparkR:::callJStatic("sparkrtest.DummyClass",
   "addStuff",
   2L,
   2L)
+stopifnot(basicFunction == 4L)
 
 sparkR.session.stop()
-output <- c(helloTest, basicFunction)
-writeLines(output)

http://git-wip-us.apache.org/repos/asf/spark/blob/6f209c8f/R/pkg/inst/tests/testthat/test_includeJAR.R
--
diff --git a/R/pkg/inst/tests/testthat/test_includeJAR.R 
b/R/pkg/inst/tests/testthat/test_includeJAR.R
deleted file mode 100644
index 512dd39..000
--- a/R/pkg/inst/tests/testthat/test_includeJAR.R
+++ /dev/null
@@ -1,36 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-context("include an external JAR in SparkContext")
-
-runScript <- function() {
-  sparkHome <- Sys.getenv("SPARK_HOME")
-  sparkTestJarPath <- "R/lib/SparkR/test_support/sparktestjar_2.10-1.0.jar"
-  jarPath <- paste("--jars", shQuote(file.path(sparkHome, sparkTestJarPath)))
-  scriptPath <- file.path(sparkHome, "R/lib/SparkR/tests/testthat/jarTest.R")
-  submitPath <- 

spark git commit: [SPARK-10683][SPARK-16510][SPARKR] Move SparkR include jar test to SparkSubmitSuite

2016-07-19 Thread shivaram
Repository: spark
Updated Branches:
  refs/heads/master 9674af6f6 -> fc2326362


[SPARK-10683][SPARK-16510][SPARKR] Move SparkR include jar test to 
SparkSubmitSuite

## What changes were proposed in this pull request?

This change moves the include jar test from R to SparkSubmitSuite and uses a 
dynamically compiled jar. This helps us remove the binary jar from the R 
package and solves both the CRAN warnings and the lack of source being 
available for this jar.

## How was this patch tested?
SparkR unit tests, SparkSubmitSuite, check-cran.sh

Author: Shivaram Venkataraman 

Closes #14243 from shivaram/sparkr-jar-move.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fc232636
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fc232636
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fc232636

Branch: refs/heads/master
Commit: fc23263623d5dcd1167fa93c094fe41ace77c326
Parents: 9674af6
Author: Shivaram Venkataraman 
Authored: Tue Jul 19 19:28:08 2016 -0700
Committer: Shivaram Venkataraman 
Committed: Tue Jul 19 19:28:08 2016 -0700

--
 .../inst/test_support/sparktestjar_2.10-1.0.jar | Bin 2886 -> 0 bytes
 R/pkg/inst/tests/testthat/jarTest.R |  10 ++---
 R/pkg/inst/tests/testthat/test_includeJAR.R |  36 --
 .../scala/org/apache/spark/api/r/RUtils.scala   |   9 +
 .../apache/spark/deploy/SparkSubmitSuite.scala  |  38 +++
 5 files changed, 52 insertions(+), 41 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fc232636/R/pkg/inst/test_support/sparktestjar_2.10-1.0.jar
--
diff --git a/R/pkg/inst/test_support/sparktestjar_2.10-1.0.jar 
b/R/pkg/inst/test_support/sparktestjar_2.10-1.0.jar
deleted file mode 100644
index 1d5c2af..000
Binary files a/R/pkg/inst/test_support/sparktestjar_2.10-1.0.jar and /dev/null 
differ

http://git-wip-us.apache.org/repos/asf/spark/blob/fc232636/R/pkg/inst/tests/testthat/jarTest.R
--
diff --git a/R/pkg/inst/tests/testthat/jarTest.R 
b/R/pkg/inst/tests/testthat/jarTest.R
index 51754a4..c9615c8 100644
--- a/R/pkg/inst/tests/testthat/jarTest.R
+++ b/R/pkg/inst/tests/testthat/jarTest.R
@@ -16,17 +16,17 @@
 #
 library(SparkR)
 
-sparkR.session()
+sc <- sparkR.session()
 
-helloTest <- SparkR:::callJStatic("sparkR.test.hello",
+helloTest <- SparkR:::callJStatic("sparkrtest.DummyClass",
   "helloWorld",
   "Dave")
+stopifnot(identical(helloTest, "Hello Dave"))
 
-basicFunction <- SparkR:::callJStatic("sparkR.test.basicFunction",
+basicFunction <- SparkR:::callJStatic("sparkrtest.DummyClass",
   "addStuff",
   2L,
   2L)
+stopifnot(basicFunction == 4L)
 
 sparkR.session.stop()
-output <- c(helloTest, basicFunction)
-writeLines(output)

http://git-wip-us.apache.org/repos/asf/spark/blob/fc232636/R/pkg/inst/tests/testthat/test_includeJAR.R
--
diff --git a/R/pkg/inst/tests/testthat/test_includeJAR.R 
b/R/pkg/inst/tests/testthat/test_includeJAR.R
deleted file mode 100644
index 512dd39..000
--- a/R/pkg/inst/tests/testthat/test_includeJAR.R
+++ /dev/null
@@ -1,36 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-context("include an external JAR in SparkContext")
-
-runScript <- function() {
-  sparkHome <- Sys.getenv("SPARK_HOME")
-  sparkTestJarPath <- "R/lib/SparkR/test_support/sparktestjar_2.10-1.0.jar"
-  jarPath <- paste("--jars", shQuote(file.path(sparkHome, sparkTestJarPath)))
-  scriptPath <- file.path(sparkHome, "R/lib/SparkR/tests/testthat/jarTest.R")
-  submitPath <- file.path(sparkHome, paste("bin/", determineSparkSubmitBin(), 
sep = ""))
-  combinedArgs <- paste(jarPath, scriptPath, sep = " ")
-  res <- 

spark git commit: [SPARK-16568][SQL][DOCUMENTATION] update sql programming guide refreshTable API in python code

2016-07-19 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 307f8922b -> f58fd4620


[SPARK-16568][SQL][DOCUMENTATION] update sql programming guide refreshTable API 
in python code

## What changes were proposed in this pull request?

update `refreshTable` API in python code of the sql-programming-guide.

This API is added in SPARK-15820

## How was this patch tested?

N/A

Author: WeichenXu 

Closes #14220 from WeichenXu123/update_sql_doc_catalog.

(cherry picked from commit 9674af6f6f81066139ea675de724f951bd0d49c9)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f58fd462
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f58fd462
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f58fd462

Branch: refs/heads/branch-2.0
Commit: f58fd4620f703fba0c8be0724c0150b08e984a2b
Parents: 307f892
Author: WeichenXu 
Authored: Tue Jul 19 18:48:41 2016 -0700
Committer: Reynold Xin 
Committed: Tue Jul 19 18:48:49 2016 -0700

--
 docs/sql-programming-guide.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f58fd462/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index a88efb7..8d92a43 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -869,8 +869,8 @@ spark.catalog().refreshTable("my_table");
 
 
 {% highlight python %}
-# spark is an existing HiveContext
-spark.refreshTable("my_table")
+# spark is an existing SparkSession
+spark.catalog.refreshTable("my_table")
 {% endhighlight %}
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16568][SQL][DOCUMENTATION] update sql programming guide refreshTable API in python code

2016-07-19 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 004e29cba -> 9674af6f6


[SPARK-16568][SQL][DOCUMENTATION] update sql programming guide refreshTable API 
in python code

## What changes were proposed in this pull request?

update `refreshTable` API in python code of the sql-programming-guide.

This API is added in SPARK-15820

## How was this patch tested?

N/A

Author: WeichenXu 

Closes #14220 from WeichenXu123/update_sql_doc_catalog.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9674af6f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9674af6f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9674af6f

Branch: refs/heads/master
Commit: 9674af6f6f81066139ea675de724f951bd0d49c9
Parents: 004e29c
Author: WeichenXu 
Authored: Tue Jul 19 18:48:41 2016 -0700
Committer: Reynold Xin 
Committed: Tue Jul 19 18:48:41 2016 -0700

--
 docs/sql-programming-guide.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9674af6f/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 71f3ee4..3af935a 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -869,8 +869,8 @@ spark.catalog().refreshTable("my_table");
 
 
 {% highlight python %}
-# spark is an existing HiveContext
-spark.refreshTable("my_table")
+# spark is an existing SparkSession
+spark.catalog.refreshTable("my_table")
 {% endhighlight %}
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-14702] Make environment of SparkLauncher launched process more configurable

2016-07-19 Thread vanzin
Repository: spark
Updated Branches:
  refs/heads/master 2ae7b88a0 -> 004e29cba


[SPARK-14702] Make environment of SparkLauncher launched process more 
configurable

## What changes were proposed in this pull request?

Adds a few public methods to `SparkLauncher` to allow configuring some extra 
features of the `ProcessBuilder`, including the working directory, output and 
error stream redirection.

## How was this patch tested?

Unit testing + simple Spark driver programs

Author: Andrew Duffy 

Closes #14201 from andreweduffy/feature/launcher.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/004e29cb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/004e29cb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/004e29cb

Branch: refs/heads/master
Commit: 004e29cba518684d239d2d1661dce7c894a79f14
Parents: 2ae7b88
Author: Andrew Duffy 
Authored: Tue Jul 19 17:08:38 2016 -0700
Committer: Marcelo Vanzin 
Committed: Tue Jul 19 17:08:38 2016 -0700

--
 .../spark/launcher/SparkLauncherSuite.java  |  67 +++-
 .../spark/launcher/ChildProcAppHandle.java  |   5 +-
 .../apache/spark/launcher/SparkLauncher.java| 167 ---
 3 files changed, 208 insertions(+), 31 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/004e29cb/core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java
--
diff --git 
a/core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java 
b/core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java
index 8ca54b2..e393db0 100644
--- a/core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java
+++ b/core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java
@@ -21,6 +21,7 @@ import java.util.Arrays;
 import java.util.HashMap;
 import java.util.Map;
 
+import org.junit.Before;
 import org.junit.Test;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
@@ -40,10 +41,15 @@ public class SparkLauncherSuite {
   private static final Logger LOG = 
LoggerFactory.getLogger(SparkLauncherSuite.class);
   private static final NamedThreadFactory TF = new 
NamedThreadFactory("SparkLauncherSuite-%d");
 
+  private SparkLauncher launcher;
+
+  @Before
+  public void configureLauncher() {
+launcher = new 
SparkLauncher().setSparkHome(System.getProperty("spark.test.home"));
+  }
+
   @Test
   public void testSparkArgumentHandling() throws Exception {
-SparkLauncher launcher = new SparkLauncher()
-  .setSparkHome(System.getProperty("spark.test.home"));
 SparkSubmitOptionParser opts = new SparkSubmitOptionParser();
 
 launcher.addSparkArg(opts.HELP);
@@ -85,14 +91,67 @@ public class SparkLauncherSuite {
 assertEquals("bar", launcher.builder.conf.get("spark.foo"));
   }
 
+  @Test(expected=IllegalStateException.class)
+  public void testRedirectTwiceFails() throws Exception {
+launcher.setAppResource("fake-resource.jar")
+  .setMainClass("my.fake.class.Fake")
+  .redirectError()
+  .redirectError(ProcessBuilder.Redirect.PIPE)
+  .launch();
+  }
+
+  @Test(expected=IllegalStateException.class)
+  public void testRedirectToLogWithOthersFails() throws Exception {
+launcher.setAppResource("fake-resource.jar")
+  .setMainClass("my.fake.class.Fake")
+  .redirectToLog("fakeLog")
+  .redirectError(ProcessBuilder.Redirect.PIPE)
+  .launch();
+  }
+
+  @Test
+  public void testRedirectErrorToOutput() throws Exception {
+launcher.redirectError();
+assertTrue(launcher.redirectErrorStream);
+  }
+
+  @Test
+  public void testRedirectsSimple() throws Exception {
+launcher.redirectError(ProcessBuilder.Redirect.PIPE);
+assertNotNull(launcher.errorStream);
+assertEquals(launcher.errorStream.type(), 
ProcessBuilder.Redirect.Type.PIPE);
+
+launcher.redirectOutput(ProcessBuilder.Redirect.PIPE);
+assertNotNull(launcher.outputStream);
+assertEquals(launcher.outputStream.type(), 
ProcessBuilder.Redirect.Type.PIPE);
+  }
+
+  @Test
+  public void testRedirectLastWins() throws Exception {
+launcher.redirectError(ProcessBuilder.Redirect.PIPE)
+  .redirectError(ProcessBuilder.Redirect.INHERIT);
+assertEquals(launcher.errorStream.type(), 
ProcessBuilder.Redirect.Type.INHERIT);
+
+launcher.redirectOutput(ProcessBuilder.Redirect.PIPE)
+  .redirectOutput(ProcessBuilder.Redirect.INHERIT);
+assertEquals(launcher.outputStream.type(), 
ProcessBuilder.Redirect.Type.INHERIT);
+  }
+
+  @Test
+  public void testRedirectToLog() throws Exception {
+launcher.redirectToLog("fakeLogger");
+assertTrue(launcher.redirectToLog);
+assertTrue(launcher.builder.getEffectiveConfig()
+  

[2/2] spark git commit: Preparing development version 2.0.1-SNAPSHOT

2016-07-19 Thread pwendell
Preparing development version 2.0.1-SNAPSHOT


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/307f8922
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/307f8922
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/307f8922

Branch: refs/heads/branch-2.0
Commit: 307f8922be5c781d83c295edbbe9ad0f0d2095e3
Parents: 13650fc
Author: Patrick Wendell 
Authored: Tue Jul 19 14:02:33 2016 -0700
Committer: Patrick Wendell 
Committed: Tue Jul 19 14:02:33 2016 -0700

--
 assembly/pom.xml  | 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/java8-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 repl/pom.xml  | 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 yarn/pom.xml  | 2 +-
 34 files changed, 34 insertions(+), 34 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/307f8922/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 5f546bb..507ddc7 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.0.0
+2.0.1-SNAPSHOT
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/307f8922/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 2eaa810..bc3b0fe 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.0.0
+2.0.1-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/307f8922/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index f068d9d..2fb5835 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.0.0
+2.0.1-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/307f8922/common/network-yarn/pom.xml
--
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index fd22188..07d9f1c 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.0.0
+2.0.1-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/307f8922/common/sketch/pom.xml
--
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index a17aba5..5e02efd 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.0.0
+2.0.1-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/307f8922/common/tags/pom.xml
--
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index 0bd8846..e7fc6a2 100644
--- a/common/tags/pom.xml
+++ b/common/tags/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark

[spark] Git Push Summary

2016-07-19 Thread pwendell
Repository: spark
Updated Tags:  refs/tags/v2.0.0-rc5 [created] 13650fc58

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[1/2] spark git commit: Preparing Spark release v2.0.0-rc5

2016-07-19 Thread pwendell
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 80ab8b666 -> 307f8922b


Preparing Spark release v2.0.0-rc5


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/13650fc5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/13650fc5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/13650fc5

Branch: refs/heads/branch-2.0
Commit: 13650fc58e1fcf2cf2a26ba11c819185ae1acc1f
Parents: 80ab8b6
Author: Patrick Wendell 
Authored: Tue Jul 19 14:02:27 2016 -0700
Committer: Patrick Wendell 
Committed: Tue Jul 19 14:02:27 2016 -0700

--
 assembly/pom.xml  | 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/java8-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 repl/pom.xml  | 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 yarn/pom.xml  | 2 +-
 34 files changed, 34 insertions(+), 34 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/13650fc5/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 507ddc7..5f546bb 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.0.1-SNAPSHOT
+2.0.0
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/13650fc5/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index bc3b0fe..2eaa810 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.0.1-SNAPSHOT
+2.0.0
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/13650fc5/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 2fb5835..f068d9d 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.0.1-SNAPSHOT
+2.0.0
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/13650fc5/common/network-yarn/pom.xml
--
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 07d9f1c..fd22188 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.0.1-SNAPSHOT
+2.0.0
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/13650fc5/common/sketch/pom.xml
--
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 5e02efd..a17aba5 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.0.1-SNAPSHOT
+2.0.0
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/13650fc5/common/tags/pom.xml
--
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index e7fc6a2..0bd8846 100644
--- 

spark git commit: [SPARK-15705][SQL] Change the default value of spark.sql.hive.convertMetastoreOrc to false.

2016-07-19 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 f18f9ca5b -> 80ab8b666


[SPARK-15705][SQL] Change the default value of 
spark.sql.hive.convertMetastoreOrc to false.

## What changes were proposed in this pull request?
In 2.0, we add a new logic to convert HiveTableScan on ORC tables to Spark's 
native code path. However, during this conversion, we drop the original 
metastore schema (https://issues.apache.org/jira/browse/SPARK-15705). Because 
of this regression, I am changing the default value of 
`spark.sql.hive.convertMetastoreOrc` to false.

Author: Yin Huai 

Closes #14267 from yhuai/SPARK-15705-changeDefaultValue.

(cherry picked from commit 2ae7b88a07140e012b6c60db3c4a2a8ca360c684)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/80ab8b66
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/80ab8b66
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/80ab8b66

Branch: refs/heads/branch-2.0
Commit: 80ab8b666f007de15fa9427f9734ed91238605b0
Parents: f18f9ca
Author: Yin Huai 
Authored: Tue Jul 19 12:58:08 2016 -0700
Committer: Reynold Xin 
Committed: Tue Jul 19 12:58:13 2016 -0700

--
 sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/80ab8b66/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
--
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
index 9ed357c..bdec611 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
@@ -97,10 +97,11 @@ private[spark] object HiveUtils extends Logging {
   .createWithDefault(false)
 
   val CONVERT_METASTORE_ORC = 
SQLConfigBuilder("spark.sql.hive.convertMetastoreOrc")
+.internal()
 .doc("When set to false, Spark SQL will use the Hive SerDe for ORC tables 
instead of " +
   "the built in support.")
 .booleanConf
-.createWithDefault(true)
+.createWithDefault(false)
 
   val HIVE_METASTORE_SHARED_PREFIXES = 
SQLConfigBuilder("spark.sql.hive.metastore.sharedPrefixes")
 .doc("A comma separated list of class prefixes that should be loaded using 
the classloader " +


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15705][SQL] Change the default value of spark.sql.hive.convertMetastoreOrc to false.

2016-07-19 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 162d04a30 -> 2ae7b88a0


[SPARK-15705][SQL] Change the default value of 
spark.sql.hive.convertMetastoreOrc to false.

## What changes were proposed in this pull request?
In 2.0, we add a new logic to convert HiveTableScan on ORC tables to Spark's 
native code path. However, during this conversion, we drop the original 
metastore schema (https://issues.apache.org/jira/browse/SPARK-15705). Because 
of this regression, I am changing the default value of 
`spark.sql.hive.convertMetastoreOrc` to false.

Author: Yin Huai 

Closes #14267 from yhuai/SPARK-15705-changeDefaultValue.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2ae7b88a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2ae7b88a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2ae7b88a

Branch: refs/heads/master
Commit: 2ae7b88a07140e012b6c60db3c4a2a8ca360c684
Parents: 162d04a
Author: Yin Huai 
Authored: Tue Jul 19 12:58:08 2016 -0700
Committer: Reynold Xin 
Committed: Tue Jul 19 12:58:08 2016 -0700

--
 sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2ae7b88a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
--
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
index 9ed357c..bdec611 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
@@ -97,10 +97,11 @@ private[spark] object HiveUtils extends Logging {
   .createWithDefault(false)
 
   val CONVERT_METASTORE_ORC = 
SQLConfigBuilder("spark.sql.hive.convertMetastoreOrc")
+.internal()
 .doc("When set to false, Spark SQL will use the Hive SerDe for ORC tables 
instead of " +
   "the built in support.")
 .booleanConf
-.createWithDefault(true)
+.createWithDefault(false)
 
   val HIVE_METASTORE_SHARED_PREFIXES = 
SQLConfigBuilder("spark.sql.hive.metastore.sharedPrefixes")
 .doc("A comma separated list of class prefixes that should be loaded using 
the classloader " +


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16602][SQL] `Nvl` function should support numeric-string cases

2016-07-19 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 6ca1d941b -> f18f9ca5b


[SPARK-16602][SQL] `Nvl` function should support numeric-string cases

## What changes were proposed in this pull request?

`Nvl` function should support numeric-straing cases like Hive/Spark1.6. 
Currently, `Nvl` finds the tightest common types among numeric types. This PR 
extends that to consider `String` type, too.

```scala
- TypeCoercion.findTightestCommonTypeOfTwo(left.dataType, right.dataType).map { 
dtype =>
+ TypeCoercion.findTightestCommonTypeToString(left.dataType, 
right.dataType).map { dtype =>
```

**Before**
```scala
scala> sql("select nvl('0', 1)").collect()
org.apache.spark.sql.AnalysisException: cannot resolve `nvl("0", 1)` due to 
data type mismatch:
input to function coalesce should all be the same type, but it's [string, int]; 
line 1 pos 7
```

**After**
```scala
scala> sql("select nvl('0', 1)").collect()
res0: Array[org.apache.spark.sql.Row] = Array([0])
```

## How was this patch tested?

Pass the Jenkins tests.

Author: Dongjoon Hyun 

Closes #14251 from dongjoon-hyun/SPARK-16602.

(cherry picked from commit 162d04a30e38bb83d35865679145f8ea80b84c26)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f18f9ca5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f18f9ca5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f18f9ca5

Branch: refs/heads/branch-2.0
Commit: f18f9ca5b22ca11712694b1106463ae6efc1d646
Parents: 6ca1d94
Author: Dongjoon Hyun 
Authored: Tue Jul 19 10:28:17 2016 -0700
Committer: Reynold Xin 
Committed: Tue Jul 19 10:28:24 2016 -0700

--
 .../spark/sql/catalyst/analysis/TypeCoercion.scala   |  2 +-
 .../sql/catalyst/expressions/nullExpressions.scala   |  2 +-
 .../catalyst/expressions/NullFunctionsSuite.scala| 15 +++
 3 files changed, 17 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f18f9ca5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
index baec6d1..9a040f8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
@@ -100,7 +100,7 @@ object TypeCoercion {
   }
 
   /** Similar to [[findTightestCommonType]], but can promote all the way to 
StringType. */
-  private def findTightestCommonTypeToString(left: DataType, right: DataType): 
Option[DataType] = {
+  def findTightestCommonTypeToString(left: DataType, right: DataType): 
Option[DataType] = {
 findTightestCommonTypeOfTwo(left, right).orElse((left, right) match {
   case (StringType, t2: AtomicType) if t2 != BinaryType && t2 != 
BooleanType => Some(StringType)
   case (t1: AtomicType, StringType) if t1 != BinaryType && t1 != 
BooleanType => Some(StringType)

http://git-wip-us.apache.org/repos/asf/spark/blob/f18f9ca5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
index 523fb05..1c18265 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
@@ -134,7 +134,7 @@ case class Nvl(left: Expression, right: Expression) extends 
RuntimeReplaceable {
 
   override def replaceForTypeCoercion(): Expression = {
 if (left.dataType != right.dataType) {
-  TypeCoercion.findTightestCommonTypeOfTwo(left.dataType, 
right.dataType).map { dtype =>
+  TypeCoercion.findTightestCommonTypeToString(left.dataType, 
right.dataType).map { dtype =>
 copy(left = Cast(left, dtype), right = Cast(right, dtype))
   }.getOrElse(this)
 } else {

http://git-wip-us.apache.org/repos/asf/spark/blob/f18f9ca5/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullFunctionsSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullFunctionsSuite.scala
 

spark git commit: [SPARK-16602][SQL] `Nvl` function should support numeric-string cases

2016-07-19 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 0bd76e872 -> 162d04a30


[SPARK-16602][SQL] `Nvl` function should support numeric-string cases

## What changes were proposed in this pull request?

`Nvl` function should support numeric-straing cases like Hive/Spark1.6. 
Currently, `Nvl` finds the tightest common types among numeric types. This PR 
extends that to consider `String` type, too.

```scala
- TypeCoercion.findTightestCommonTypeOfTwo(left.dataType, right.dataType).map { 
dtype =>
+ TypeCoercion.findTightestCommonTypeToString(left.dataType, 
right.dataType).map { dtype =>
```

**Before**
```scala
scala> sql("select nvl('0', 1)").collect()
org.apache.spark.sql.AnalysisException: cannot resolve `nvl("0", 1)` due to 
data type mismatch:
input to function coalesce should all be the same type, but it's [string, int]; 
line 1 pos 7
```

**After**
```scala
scala> sql("select nvl('0', 1)").collect()
res0: Array[org.apache.spark.sql.Row] = Array([0])
```

## How was this patch tested?

Pass the Jenkins tests.

Author: Dongjoon Hyun 

Closes #14251 from dongjoon-hyun/SPARK-16602.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/162d04a3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/162d04a3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/162d04a3

Branch: refs/heads/master
Commit: 162d04a30e38bb83d35865679145f8ea80b84c26
Parents: 0bd76e8
Author: Dongjoon Hyun 
Authored: Tue Jul 19 10:28:17 2016 -0700
Committer: Reynold Xin 
Committed: Tue Jul 19 10:28:17 2016 -0700

--
 .../spark/sql/catalyst/analysis/TypeCoercion.scala   |  2 +-
 .../sql/catalyst/expressions/nullExpressions.scala   |  2 +-
 .../catalyst/expressions/NullFunctionsSuite.scala| 15 +++
 3 files changed, 17 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/162d04a3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
index baec6d1..9a040f8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
@@ -100,7 +100,7 @@ object TypeCoercion {
   }
 
   /** Similar to [[findTightestCommonType]], but can promote all the way to 
StringType. */
-  private def findTightestCommonTypeToString(left: DataType, right: DataType): 
Option[DataType] = {
+  def findTightestCommonTypeToString(left: DataType, right: DataType): 
Option[DataType] = {
 findTightestCommonTypeOfTwo(left, right).orElse((left, right) match {
   case (StringType, t2: AtomicType) if t2 != BinaryType && t2 != 
BooleanType => Some(StringType)
   case (t1: AtomicType, StringType) if t1 != BinaryType && t1 != 
BooleanType => Some(StringType)

http://git-wip-us.apache.org/repos/asf/spark/blob/162d04a3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
index 523fb05..1c18265 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
@@ -134,7 +134,7 @@ case class Nvl(left: Expression, right: Expression) extends 
RuntimeReplaceable {
 
   override def replaceForTypeCoercion(): Expression = {
 if (left.dataType != right.dataType) {
-  TypeCoercion.findTightestCommonTypeOfTwo(left.dataType, 
right.dataType).map { dtype =>
+  TypeCoercion.findTightestCommonTypeToString(left.dataType, 
right.dataType).map { dtype =>
 copy(left = Cast(left, dtype), right = Cast(right, dtype))
   }.getOrElse(this)
 } else {

http://git-wip-us.apache.org/repos/asf/spark/blob/162d04a3/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullFunctionsSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullFunctionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullFunctionsSuite.scala
index ace6c15..712fe35 100644
--- 

spark git commit: [SPARK-16620][CORE] Add back the tokenization process in `RDD.pipe(command: String)`

2016-07-19 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 2c74b6d73 -> 6ca1d941b


[SPARK-16620][CORE] Add back the tokenization process in `RDD.pipe(command: 
String)`

## What changes were proposed in this pull request?

Currently `RDD.pipe(command: String)`:
- works only when the command is specified without any options, such as 
`RDD.pipe("wc")`
- does NOT work when the command is specified with some options, such as 
`RDD.pipe("wc -l")`

This is a regression from Spark 1.6.

This patch adds back the tokenization process in `RDD.pipe(command: String)` to 
fix this regression.

## How was this patch tested?
Added a test which:
- would pass in `1.6`
- _[prior to this patch]_ would fail in `master`
- _[after this patch]_ would pass in `master`

Author: Liwei Lin 

Closes #14256 from lw-lin/rdd-pipe.

(cherry picked from commit 0bd76e872b60cb80295fc12654e370cf22390056)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6ca1d941
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6ca1d941
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6ca1d941

Branch: refs/heads/branch-2.0
Commit: 6ca1d941b0b417f10533ab3506a9f3cf60e6a7fe
Parents: 2c74b6d
Author: Liwei Lin 
Authored: Tue Jul 19 10:24:48 2016 -0700
Committer: Reynold Xin 
Committed: Tue Jul 19 10:25:24 2016 -0700

--
 core/src/main/scala/org/apache/spark/rdd/RDD.scala  |  8 ++--
 .../scala/org/apache/spark/rdd/PipedRDDSuite.scala  | 16 
 2 files changed, 22 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6ca1d941/core/src/main/scala/org/apache/spark/rdd/RDD.scala
--
diff --git a/core/src/main/scala/org/apache/spark/rdd/RDD.scala 
b/core/src/main/scala/org/apache/spark/rdd/RDD.scala
index b7a5b22..0804cde 100644
--- a/core/src/main/scala/org/apache/spark/rdd/RDD.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/RDD.scala
@@ -699,14 +699,18 @@ abstract class RDD[T: ClassTag](
* Return an RDD created by piping elements to a forked external process.
*/
   def pipe(command: String): RDD[String] = withScope {
-pipe(command)
+// Similar to Runtime.exec(), if we are given a single string, split it 
into words
+// using a standard StringTokenizer (i.e. by spaces)
+pipe(PipedRDD.tokenize(command))
   }
 
   /**
* Return an RDD created by piping elements to a forked external process.
*/
   def pipe(command: String, env: Map[String, String]): RDD[String] = withScope 
{
-pipe(command, env)
+// Similar to Runtime.exec(), if we are given a single string, split it 
into words
+// using a standard StringTokenizer (i.e. by spaces)
+pipe(PipedRDD.tokenize(command), env)
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/6ca1d941/core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala 
b/core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala
index 27cfdc7..5d56fc1 100644
--- a/core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala
+++ b/core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala
@@ -51,6 +51,22 @@ class PipedRDDSuite extends SparkFunSuite with 
SharedSparkContext {
 }
   }
 
+  test("basic pipe with tokenization") {
+if (testCommandAvailable("wc")) {
+  val nums = sc.makeRDD(Array(1, 2, 3, 4), 2)
+
+  // verify that both RDD.pipe(command: String) and RDD.pipe(command: 
String, env) work good
+  for (piped <- Seq(nums.pipe("wc -l"), nums.pipe("wc -l", Map[String, 
String]( {
+val c = piped.collect()
+assert(c.size === 2)
+assert(c(0).trim === "2")
+assert(c(1).trim === "2")
+  }
+} else {
+  assert(true)
+}
+  }
+
   test("failure in iterating over pipe input") {
 if (testCommandAvailable("cat")) {
   val nums =


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16494][ML] Upgrade breeze version to 0.12

2016-07-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 5d92326be -> 670891496


[SPARK-16494][ML] Upgrade breeze version to 0.12

## What changes were proposed in this pull request?
breeze 0.12 has been released for more than half a year, and it brings lots of 
new features, performance improvement and bug fixes.
One of the biggest features is ```LBFGS-B``` which is an implementation of 
```LBFGS``` with box constraints and much faster for some special case.
We would like to implement Huber loss function for ```LinearRegression``` 
([SPARK-3181](https://issues.apache.org/jira/browse/SPARK-3181)) and it 
requires ```LBFGS-B``` as the optimization solver. So we should bump up the 
dependent breeze version to 0.12.
For more features, improvements and bug fixes of breeze 0.12, you can refer the 
following link:
https://groups.google.com/forum/#!topic/scala-breeze/nEeRi_DcY5c

## How was this patch tested?
No new tests, should pass the existing ones.

Author: Yanbo Liang 

Closes #14150 from yanboliang/spark-16494.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/67089149
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/67089149
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/67089149

Branch: refs/heads/master
Commit: 670891496a82538a5e2bf981a4044fb6f4cbb062
Parents: 5d92326
Author: Yanbo Liang 
Authored: Tue Jul 19 12:31:04 2016 +0100
Committer: Sean Owen 
Committed: Tue Jul 19 12:31:04 2016 +0100

--
 dev/deps/spark-deps-hadoop-2.2  | 5 +++--
 dev/deps/spark-deps-hadoop-2.3  | 5 +++--
 dev/deps/spark-deps-hadoop-2.4  | 5 +++--
 dev/deps/spark-deps-hadoop-2.6  | 5 +++--
 dev/deps/spark-deps-hadoop-2.7  | 5 +++--
 .../apache/spark/ml/classification/LogisticRegression.scala | 5 -
 .../apache/spark/ml/regression/AFTSurvivalRegression.scala  | 6 --
 .../org/apache/spark/ml/regression/LinearRegression.scala   | 5 -
 .../scala/org/apache/spark/mllib/clustering/LDAModel.scala  | 8 +++-
 .../org/apache/spark/mllib/clustering/LDAOptimizer.scala| 5 +++--
 .../scala/org/apache/spark/mllib/optimization/LBFGS.scala   | 5 -
 .../test/java/org/apache/spark/ml/feature/JavaPCASuite.java | 6 +-
 .../scala/org/apache/spark/mllib/clustering/LDASuite.scala  | 4 ++--
 .../scala/org/apache/spark/mllib/feature/PCASuite.scala | 9 ++---
 pom.xml | 2 +-
 python/pyspark/ml/classification.py | 2 +-
 16 files changed, 40 insertions(+), 42 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/67089149/dev/deps/spark-deps-hadoop-2.2
--
diff --git a/dev/deps/spark-deps-hadoop-2.2 b/dev/deps/spark-deps-hadoop-2.2
index feb3474..5d536b7 100644
--- a/dev/deps/spark-deps-hadoop-2.2
+++ b/dev/deps/spark-deps-hadoop-2.2
@@ -12,8 +12,8 @@ avro-1.7.7.jar
 avro-ipc-1.7.7.jar
 avro-mapred-1.7.7-hadoop2.jar
 bonecp-0.8.0.RELEASE.jar
-breeze-macros_2.11-0.11.2.jar
-breeze_2.11-0.11.2.jar
+breeze-macros_2.11-0.12.jar
+breeze_2.11-0.12.jar
 calcite-avatica-1.2.0-incubating.jar
 calcite-core-1.2.0-incubating.jar
 calcite-linq4j-1.2.0-incubating.jar
@@ -147,6 +147,7 @@ scala-parser-combinators_2.11-1.0.4.jar
 scala-reflect-2.11.8.jar
 scala-xml_2.11-1.0.2.jar
 scalap-2.11.8.jar
+shapeless_2.11-2.0.0.jar
 slf4j-api-1.7.16.jar
 slf4j-log4j12-1.7.16.jar
 snappy-0.2.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/67089149/dev/deps/spark-deps-hadoop-2.3
--
diff --git a/dev/deps/spark-deps-hadoop-2.3 b/dev/deps/spark-deps-hadoop-2.3
index 3e96035..d16f42a 100644
--- a/dev/deps/spark-deps-hadoop-2.3
+++ b/dev/deps/spark-deps-hadoop-2.3
@@ -15,8 +15,8 @@ avro-mapred-1.7.7-hadoop2.jar
 base64-2.3.8.jar
 bcprov-jdk15on-1.51.jar
 bonecp-0.8.0.RELEASE.jar
-breeze-macros_2.11-0.11.2.jar
-breeze_2.11-0.11.2.jar
+breeze-macros_2.11-0.12.jar
+breeze_2.11-0.12.jar
 calcite-avatica-1.2.0-incubating.jar
 calcite-core-1.2.0-incubating.jar
 calcite-linq4j-1.2.0-incubating.jar
@@ -154,6 +154,7 @@ scala-parser-combinators_2.11-1.0.4.jar
 scala-reflect-2.11.8.jar
 scala-xml_2.11-1.0.2.jar
 scalap-2.11.8.jar
+shapeless_2.11-2.0.0.jar
 slf4j-api-1.7.16.jar
 slf4j-log4j12-1.7.16.jar
 snappy-0.2.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/67089149/dev/deps/spark-deps-hadoop-2.4
--
diff --git a/dev/deps/spark-deps-hadoop-2.4 b/dev/deps/spark-deps-hadoop-2.4
index 3fc14a6..2e261cb 100644
--- a/dev/deps/spark-deps-hadoop-2.4
+++ 

spark git commit: [SPARK-16478] graphX (added graph caching in strongly connected components)

2016-07-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 6c4b9f4be -> 5d92326be


[SPARK-16478] graphX (added graph caching in strongly connected components)

## What changes were proposed in this pull request?

I added caching in every iteration for sccGraph that is returned in strongly 
connected components. Without this cache strongly connected components returned 
graph that needed to be computed from scratch when some intermediary caches 
didn't existed anymore.

## How was this patch tested?
I tested it by running code similar to the one  [on 
databrics](https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4889410027417133/3634650767364730/3117184429335832/latest.html).
 Basically I generated large graph  and computed strongly connected components 
with changed code, than simply run count on vertices and edges. Count after 
this update takes few seconds instead 20 minutes.

# statement
contribution is my original work and I license the work to the project under 
the project's open source license.

Author: Michał Wesołowski 

Closes #14137 from wesolowskim/SPARK-16478.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5d92326b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5d92326b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5d92326b

Branch: refs/heads/master
Commit: 5d92326be76cb15edc6e18e94a373e197f696803
Parents: 6c4b9f4
Author: Michał Wesołowski 
Authored: Tue Jul 19 12:18:42 2016 +0100
Committer: Sean Owen 
Committed: Tue Jul 19 12:18:42 2016 +0100

--
 .../lib/StronglyConnectedComponents.scala   | 86 
 1 file changed, 50 insertions(+), 36 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5d92326b/graphx/src/main/scala/org/apache/spark/graphx/lib/StronglyConnectedComponents.scala
--
diff --git 
a/graphx/src/main/scala/org/apache/spark/graphx/lib/StronglyConnectedComponents.scala
 
b/graphx/src/main/scala/org/apache/spark/graphx/lib/StronglyConnectedComponents.scala
old mode 100644
new mode 100755
index 1fa92b0..e4f80ff
--- 
a/graphx/src/main/scala/org/apache/spark/graphx/lib/StronglyConnectedComponents.scala
+++ 
b/graphx/src/main/scala/org/apache/spark/graphx/lib/StronglyConnectedComponents.scala
@@ -44,6 +44,9 @@ object StronglyConnectedComponents {
 // graph we are going to work with in our iterations
 var sccWorkGraph = graph.mapVertices { case (vid, _) => (vid, false) 
}.cache()
 
+// helper variables to unpersist cached graphs
+var prevSccGraph = sccGraph
+
 var numVertices = sccWorkGraph.numVertices
 var iter = 0
 while (sccWorkGraph.numVertices > 0 && iter < numIter) {
@@ -64,48 +67,59 @@ object StronglyConnectedComponents {
 // write values to sccGraph
 sccGraph = sccGraph.outerJoinVertices(finalVertices) {
   (vid, scc, opt) => opt.getOrElse(scc)
-}
+}.cache()
+// materialize vertices and edges
+sccGraph.vertices.count()
+sccGraph.edges.count()
+// sccGraph materialized so, unpersist can be done on previous
+prevSccGraph.unpersist(blocking = false)
+prevSccGraph = sccGraph
+
 // only keep vertices that are not final
 sccWorkGraph = sccWorkGraph.subgraph(vpred = (vid, data) => 
!data._2).cache()
   } while (sccWorkGraph.numVertices < numVertices)
 
-  sccWorkGraph = sccWorkGraph.mapVertices{ case (vid, (color, isFinal)) => 
(vid, isFinal) }
+  // if iter < numIter at this point sccGraph that is returned
+  // will not be recomputed and pregel executions are pointless
+  if (iter < numIter) {
+sccWorkGraph = sccWorkGraph.mapVertices { case (vid, (color, isFinal)) 
=> (vid, isFinal) }
 
-  // collect min of all my neighbor's scc values, update if it's smaller 
than mine
-  // then notify any neighbors with scc values larger than mine
-  sccWorkGraph = Pregel[(VertexId, Boolean), ED, VertexId](
-sccWorkGraph, Long.MaxValue, activeDirection = EdgeDirection.Out)(
-(vid, myScc, neighborScc) => (math.min(myScc._1, neighborScc), 
myScc._2),
-e => {
-  if (e.srcAttr._1 < e.dstAttr._1) {
-Iterator((e.dstId, e.srcAttr._1))
-  } else {
-Iterator()
-  }
-},
-(vid1, vid2) => math.min(vid1, vid2))
+// collect min of all my neighbor's scc values, update if it's smaller 
than mine
+// then notify any neighbors with scc values larger than mine
+sccWorkGraph = Pregel[(VertexId, Boolean), ED, VertexId](
+  sccWorkGraph, Long.MaxValue, activeDirection = 

spark git commit: [SPARK-16395][STREAMING] Fail if too many CheckpointWriteHandlers are queued up in the fixed thread pool

2016-07-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 8310c0741 -> 6c4b9f4be


[SPARK-16395][STREAMING] Fail if too many CheckpointWriteHandlers are queued up 
in the fixed thread pool

## What changes were proposed in this pull request?

Begin failing if checkpoint writes will likely keep up with storage's ability 
to write them, to fail fast instead of slowly filling memory

## How was this patch tested?

Jenkins tests

Author: Sean Owen 

Closes #14152 from srowen/SPARK-16395.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6c4b9f4b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6c4b9f4b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6c4b9f4b

Branch: refs/heads/master
Commit: 6c4b9f4be6b429197c6a53f937a82c2ac5866d65
Parents: 8310c07
Author: Sean Owen 
Authored: Tue Jul 19 12:10:24 2016 +0100
Committer: Sean Owen 
Committed: Tue Jul 19 12:10:24 2016 +0100

--
 .../scala/org/apache/spark/streaming/Checkpoint.scala  | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6c4b9f4b/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
--
diff --git 
a/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala 
b/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
index 0b11026..398fa65 100644
--- a/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
+++ b/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
@@ -18,8 +18,8 @@
 package org.apache.spark.streaming
 
 import java.io._
-import java.util.concurrent.Executors
-import java.util.concurrent.RejectedExecutionException
+import java.util.concurrent.{ArrayBlockingQueue, RejectedExecutionException,
+  ThreadPoolExecutor, TimeUnit}
 
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{FileSystem, Path}
@@ -184,7 +184,14 @@ class CheckpointWriter(
 hadoopConf: Configuration
   ) extends Logging {
   val MAX_ATTEMPTS = 3
-  val executor = Executors.newFixedThreadPool(1)
+
+  // Single-thread executor which rejects executions when a large amount have 
queued up.
+  // This fails fast since this typically means the checkpoint store will 
never keep up, and
+  // will otherwise lead to filling memory with waiting payloads of byte[] to 
write.
+  val executor = new ThreadPoolExecutor(
+1, 1,
+0L, TimeUnit.MILLISECONDS,
+new ArrayBlockingQueue[Runnable](1000))
   val compressionCodec = CompressionCodec.createCodec(conf)
   private var stopped = false
   @volatile private[this] var fs: FileSystem = null


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16600][MLLIB] fix some latex formula syntax error

2016-07-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 929fa287e -> 2c74b6d73


[SPARK-16600][MLLIB] fix some latex formula syntax error

## What changes were proposed in this pull request?

`\partial\x` ==> `\partial x`
`har{x_i}` ==> `hat{x_i}`

## How was this patch tested?

N/A

Author: WeichenXu 

Closes #14246 from WeichenXu123/fix_formular_err.

(cherry picked from commit 8310c0741c0ca805ec74c1a78ba4a0f18e82d459)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2c74b6d7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2c74b6d7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2c74b6d7

Branch: refs/heads/branch-2.0
Commit: 2c74b6d73beab4510fa7933dde9c0a5c218cce92
Parents: 929fa28
Author: WeichenXu 
Authored: Tue Jul 19 12:07:40 2016 +0100
Committer: Sean Owen 
Committed: Tue Jul 19 12:07:49 2016 +0100

--
 .../org/apache/spark/ml/regression/LinearRegression.scala| 8 
 1 file changed, 4 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2c74b6d7/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala 
b/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
index 401f2c6..0a155e1 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
@@ -794,16 +794,16 @@ class LinearRegressionSummary private[regression] (
  *
  * Now, the first derivative of the objective function in scaled space is
  * {{{
- * \frac{\partial L}{\partial\w_i} = diff/N (x_i - \bar{x_i}) / \hat{x_i}
+ * \frac{\partial L}{\partial w_i} = diff/N (x_i - \bar{x_i}) / \hat{x_i}
  * }}}
  * However, ($x_i - \bar{x_i}$) will densify the computation, so it's not
  * an ideal formula when the training dataset is sparse format.
  *
- * This can be addressed by adding the dense \bar{x_i} / \har{x_i} terms
+ * This can be addressed by adding the dense \bar{x_i} / \hat{x_i} terms
  * in the end by keeping the sum of diff. The first derivative of total
  * objective function from all the samples is
  * {{{
- * \frac{\partial L}{\partial\w_i} =
+ * \frac{\partial L}{\partial w_i} =
  * 1/N \sum_j diff_j (x_{ij} - \bar{x_i}) / \hat{x_i}
  *   = 1/N ((\sum_j diff_j x_{ij} / \hat{x_i}) - diffSum \bar{x_i}) / 
\hat{x_i})
  *   = 1/N ((\sum_j diff_j x_{ij} / \hat{x_i}) + correction_i)
@@ -822,7 +822,7 @@ class LinearRegressionSummary private[regression] (
  * the training dataset, which can be easily computed in distributed fashion, 
and is
  * sparse format friendly.
  * {{{
- * \frac{\partial L}{\partial\w_i} = 1/N ((\sum_j diff_j x_{ij} / \hat{x_i})
+ * \frac{\partial L}{\partial w_i} = 1/N ((\sum_j diff_j x_{ij} / \hat{x_i})
  * }}},
  *
  * @param coefficients The coefficients corresponding to the features.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16600][MLLIB] fix some latex formula syntax error

2016-07-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 6caa22050 -> 8310c0741


[SPARK-16600][MLLIB] fix some latex formula syntax error

## What changes were proposed in this pull request?

`\partial\x` ==> `\partial x`
`har{x_i}` ==> `hat{x_i}`

## How was this patch tested?

N/A

Author: WeichenXu 

Closes #14246 from WeichenXu123/fix_formular_err.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8310c074
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8310c074
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8310c074

Branch: refs/heads/master
Commit: 8310c0741c0ca805ec74c1a78ba4a0f18e82d459
Parents: 6caa220
Author: WeichenXu 
Authored: Tue Jul 19 12:07:40 2016 +0100
Committer: Sean Owen 
Committed: Tue Jul 19 12:07:40 2016 +0100

--
 .../org/apache/spark/ml/regression/LinearRegression.scala| 8 
 1 file changed, 4 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8310c074/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala 
b/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
index 401f2c6..0a155e1 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
@@ -794,16 +794,16 @@ class LinearRegressionSummary private[regression] (
  *
  * Now, the first derivative of the objective function in scaled space is
  * {{{
- * \frac{\partial L}{\partial\w_i} = diff/N (x_i - \bar{x_i}) / \hat{x_i}
+ * \frac{\partial L}{\partial w_i} = diff/N (x_i - \bar{x_i}) / \hat{x_i}
  * }}}
  * However, ($x_i - \bar{x_i}$) will densify the computation, so it's not
  * an ideal formula when the training dataset is sparse format.
  *
- * This can be addressed by adding the dense \bar{x_i} / \har{x_i} terms
+ * This can be addressed by adding the dense \bar{x_i} / \hat{x_i} terms
  * in the end by keeping the sum of diff. The first derivative of total
  * objective function from all the samples is
  * {{{
- * \frac{\partial L}{\partial\w_i} =
+ * \frac{\partial L}{\partial w_i} =
  * 1/N \sum_j diff_j (x_{ij} - \bar{x_i}) / \hat{x_i}
  *   = 1/N ((\sum_j diff_j x_{ij} / \hat{x_i}) - diffSum \bar{x_i}) / 
\hat{x_i})
  *   = 1/N ((\sum_j diff_j x_{ij} / \hat{x_i}) + correction_i)
@@ -822,7 +822,7 @@ class LinearRegressionSummary private[regression] (
  * the training dataset, which can be easily computed in distributed fashion, 
and is
  * sparse format friendly.
  * {{{
- * \frac{\partial L}{\partial\w_i} = 1/N ((\sum_j diff_j x_{ij} / \hat{x_i})
+ * \frac{\partial L}{\partial w_i} = 1/N ((\sum_j diff_j x_{ij} / \hat{x_i})
  * }}},
  *
  * @param coefficients The coefficients corresponding to the features.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [MINOR][SQL][STREAMING][DOCS] Fix minor typos, punctuations and grammar

2016-07-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 eb1c20fa0 -> 929fa287e


[MINOR][SQL][STREAMING][DOCS] Fix minor typos, punctuations and grammar

## What changes were proposed in this pull request?

Minor fixes correcting some typos, punctuations, grammar.
Adding more anchors for easy navigation.
Fixing minor issues with code snippets.

## How was this patch tested?

`jekyll serve`

Author: Ahmed Mahran 

Closes #14234 from ahmed-mahran/b-struct-streaming-docs.

(cherry picked from commit 6caa22050e221cf14e2db0544fd2766dd1102bda)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/929fa287
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/929fa287
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/929fa287

Branch: refs/heads/branch-2.0
Commit: 929fa287e700c0e112f43e0c7b9bc746b5546c64
Parents: eb1c20f
Author: Ahmed Mahran 
Authored: Tue Jul 19 12:01:54 2016 +0100
Committer: Sean Owen 
Committed: Tue Jul 19 12:06:26 2016 +0100

--
 docs/structured-streaming-programming-guide.md | 154 +---
 1 file changed, 71 insertions(+), 83 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/929fa287/docs/structured-streaming-programming-guide.md
--
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 3ef39e4..aac8817 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -22,14 +22,49 @@ Let’s say you want to maintain a running word count of 
text data received from
 
 
 
+{% highlight scala %}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.SparkSession
+
+val spark = SparkSession
+  .builder
+  .appName("StructuredNetworkWordCount")
+  .getOrCreate()
+  
+import spark.implicits._
+{% endhighlight %}
 
 
 
 
+{% highlight java %}
+import org.apache.spark.api.java.function.FlatMapFunction;
+import org.apache.spark.sql.*;
+import org.apache.spark.sql.streaming.StreamingQuery;
+
+import java.util.Arrays;
+import java.util.Iterator;
+
+SparkSession spark = SparkSession
+.builder()
+.appName("JavaStructuredNetworkWordCount")
+.getOrCreate();
+{% endhighlight %}
 
 
 
 
+{% highlight python %}
+from pyspark.sql import SparkSession
+from pyspark.sql.functions import explode
+from pyspark.sql.functions import split
+
+spark = SparkSession\
+.builder()\
+.appName("StructuredNetworkWordCount")\
+.getOrCreate()
+{% endhighlight %}
+
 
 
 
@@ -39,18 +74,6 @@ Next, let’s create a streaming DataFrame that represents 
text data received fr
 
 
 {% highlight scala %}
-import org.apache.spark.sql.functions._
-import org.apache.spark.sql.SparkSession
-
-val spark = SparkSession
-  .builder
-  .appName("StructuredNetworkWordCount")
-  .getOrCreate()
-{% endhighlight %}
-
-Next, let’s create a streaming DataFrame that represents text data received 
from a server listening on localhost:, and transform the DataFrame to 
calculate word counts.
-
-{% highlight scala %}
 // Create DataFrame representing the stream of input lines from connection to 
localhost:
 val lines = spark.readStream
   .format("socket")
@@ -65,30 +88,12 @@ val words = lines.as[String].flatMap(_.split(" "))
 val wordCounts = words.groupBy("value").count()
 {% endhighlight %}
 
-This `lines` DataFrame represents an unbounded table containing the streaming 
text data. This table contains one column of strings named “value”, and 
each line in the streaming text data becomes a row in the table. Note, that 
this is not currently receiving any data as we are just setting up the 
transformation, and have not yet started it. Next, we have converted the 
DataFrame to a  Dataset of String using `.as(Encoders.STRING())`, so that we 
can apply the `flatMap` operation to split each line into multiple words. The 
resultant `words` Dataset contains all the words. Finally, we have defined the 
`wordCounts` DataFrame by grouping by the unique values in the Dataset and 
counting them. Note that this is a streaming DataFrame which represents the 
running word counts of the stream.
+This `lines` DataFrame represents an unbounded table containing the streaming 
text data. This table contains one column of strings named “value”, and 
each line in the streaming text data becomes a row in the table. Note, that 
this is not currently receiving any data as we are just setting up the 
transformation, and have not yet started it. Next, we have converted the 
DataFrame to a  Dataset of String using `.as[String]`, so that we can apply the 
`flatMap` operation to split each line 

spark git commit: [MINOR][SQL][STREAMING][DOCS] Fix minor typos, punctuations and grammar

2016-07-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 21a6dd2ae -> 6caa22050


[MINOR][SQL][STREAMING][DOCS] Fix minor typos, punctuations and grammar

## What changes were proposed in this pull request?

Minor fixes correcting some typos, punctuations, grammar.
Adding more anchors for easy navigation.
Fixing minor issues with code snippets.

## How was this patch tested?

`jekyll serve`

Author: Ahmed Mahran 

Closes #14234 from ahmed-mahran/b-struct-streaming-docs.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6caa2205
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6caa2205
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6caa2205

Branch: refs/heads/master
Commit: 6caa22050e221cf14e2db0544fd2766dd1102bda
Parents: 21a6dd2
Author: Ahmed Mahran 
Authored: Tue Jul 19 12:01:54 2016 +0100
Committer: Sean Owen 
Committed: Tue Jul 19 12:01:54 2016 +0100

--
 docs/structured-streaming-programming-guide.md | 154 +---
 1 file changed, 71 insertions(+), 83 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6caa2205/docs/structured-streaming-programming-guide.md
--
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 3ef39e4..aac8817 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -22,14 +22,49 @@ Let’s say you want to maintain a running word count of 
text data received from
 
 
 
+{% highlight scala %}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.SparkSession
+
+val spark = SparkSession
+  .builder
+  .appName("StructuredNetworkWordCount")
+  .getOrCreate()
+  
+import spark.implicits._
+{% endhighlight %}
 
 
 
 
+{% highlight java %}
+import org.apache.spark.api.java.function.FlatMapFunction;
+import org.apache.spark.sql.*;
+import org.apache.spark.sql.streaming.StreamingQuery;
+
+import java.util.Arrays;
+import java.util.Iterator;
+
+SparkSession spark = SparkSession
+.builder()
+.appName("JavaStructuredNetworkWordCount")
+.getOrCreate();
+{% endhighlight %}
 
 
 
 
+{% highlight python %}
+from pyspark.sql import SparkSession
+from pyspark.sql.functions import explode
+from pyspark.sql.functions import split
+
+spark = SparkSession\
+.builder()\
+.appName("StructuredNetworkWordCount")\
+.getOrCreate()
+{% endhighlight %}
+
 
 
 
@@ -39,18 +74,6 @@ Next, let’s create a streaming DataFrame that represents 
text data received fr
 
 
 {% highlight scala %}
-import org.apache.spark.sql.functions._
-import org.apache.spark.sql.SparkSession
-
-val spark = SparkSession
-  .builder
-  .appName("StructuredNetworkWordCount")
-  .getOrCreate()
-{% endhighlight %}
-
-Next, let’s create a streaming DataFrame that represents text data received 
from a server listening on localhost:, and transform the DataFrame to 
calculate word counts.
-
-{% highlight scala %}
 // Create DataFrame representing the stream of input lines from connection to 
localhost:
 val lines = spark.readStream
   .format("socket")
@@ -65,30 +88,12 @@ val words = lines.as[String].flatMap(_.split(" "))
 val wordCounts = words.groupBy("value").count()
 {% endhighlight %}
 
-This `lines` DataFrame represents an unbounded table containing the streaming 
text data. This table contains one column of strings named “value”, and 
each line in the streaming text data becomes a row in the table. Note, that 
this is not currently receiving any data as we are just setting up the 
transformation, and have not yet started it. Next, we have converted the 
DataFrame to a  Dataset of String using `.as(Encoders.STRING())`, so that we 
can apply the `flatMap` operation to split each line into multiple words. The 
resultant `words` Dataset contains all the words. Finally, we have defined the 
`wordCounts` DataFrame by grouping by the unique values in the Dataset and 
counting them. Note that this is a streaming DataFrame which represents the 
running word counts of the stream.
+This `lines` DataFrame represents an unbounded table containing the streaming 
text data. This table contains one column of strings named “value”, and 
each line in the streaming text data becomes a row in the table. Note, that 
this is not currently receiving any data as we are just setting up the 
transformation, and have not yet started it. Next, we have converted the 
DataFrame to a  Dataset of String using `.as[String]`, so that we can apply the 
`flatMap` operation to split each line into multiple words. The resultant 
`words` Dataset contains all the words. Finally, we have defined the 
`wordCounts` 

spark git commit: [SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant definition and inherited from the parent

2016-07-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 556a9437a -> 21a6dd2ae


[SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant definition 
and inherited from the parent

https://issues.apache.org/jira/browse/SPARK-16535

## What changes were proposed in this pull request?

When I scan through the pom.xml of sub projects, I found this warning as below 
and attached screenshot
```
Definition of groupId is redundant, because it's inherited from the parent
```
![screen shot 2016-07-13 at 3 13 11 
pm](https://cloud.githubusercontent.com/assets/3925641/16823121/744f893e-4916-11e6-8a52-042f83b9db4e.png)

I've tried to remove some of the lines with groupId definition, and the build 
on my local machine is still ok.
```
org.apache.spark
```
As I just find now `3.3.9` is being used in 
Spark 2.x, and Maven-3 supports versionless parent elements: Maven 3 will 
remove the need to specify the parent version in sub modules. THIS is great (in 
Maven 3.1).

ref: http://stackoverflow.com/questions/3157240/maven-3-worth-it/3166762#3166762

## How was this patch tested?

I've tested by re-building the project, and build succeeded.

Author: Xin Ren 

Closes #14189 from keypointt/SPARK-16535.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/21a6dd2a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/21a6dd2a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/21a6dd2a

Branch: refs/heads/master
Commit: 21a6dd2aef65a23d92f93c43fa731c0505250363
Parents: 556a943
Author: Xin Ren 
Authored: Tue Jul 19 11:59:46 2016 +0100
Committer: Sean Owen 
Committed: Tue Jul 19 11:59:46 2016 +0100

--
 assembly/pom.xml  | 1 -
 common/network-common/pom.xml | 1 -
 common/network-shuffle/pom.xml| 1 -
 common/network-yarn/pom.xml   | 1 -
 common/sketch/pom.xml | 1 -
 common/tags/pom.xml   | 1 -
 common/unsafe/pom.xml | 1 -
 core/pom.xml  | 1 -
 examples/pom.xml  | 1 -
 external/flume-assembly/pom.xml   | 1 -
 external/flume-sink/pom.xml   | 1 -
 external/flume/pom.xml| 1 -
 external/java8-tests/pom.xml  | 1 -
 external/kafka-0-10-assembly/pom.xml  | 1 -
 external/kafka-0-10/pom.xml   | 1 -
 external/kafka-0-8-assembly/pom.xml   | 1 -
 external/kafka-0-8/pom.xml| 1 -
 external/kinesis-asl-assembly/pom.xml | 1 -
 external/kinesis-asl/pom.xml  | 1 -
 external/spark-ganglia-lgpl/pom.xml   | 1 -
 graphx/pom.xml| 1 -
 launcher/pom.xml  | 1 -
 mllib-local/pom.xml   | 1 -
 mllib/pom.xml | 1 -
 repl/pom.xml  | 1 -
 sql/catalyst/pom.xml  | 1 -
 sql/core/pom.xml  | 1 -
 sql/hive-thriftserver/pom.xml | 1 -
 sql/hive/pom.xml  | 1 -
 streaming/pom.xml | 1 -
 tools/pom.xml | 1 -
 yarn/pom.xml  | 1 -
 32 files changed, 32 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/21a6dd2a/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 1b25b7c..971a62f 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -25,7 +25,6 @@
 ../pom.xml
   
 
-  org.apache.spark
   spark-assembly_2.11
   Spark Project Assembly
   http://spark.apache.org/

http://git-wip-us.apache.org/repos/asf/spark/blob/21a6dd2a/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index b1a37e8..81f0c6e 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -26,7 +26,6 @@
 ../../pom.xml
   
 
-  org.apache.spark
   spark-network-common_2.11
   jar
   Spark Project Networking

http://git-wip-us.apache.org/repos/asf/spark/blob/21a6dd2a/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 51c06b9..d211bd5 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -26,7 +26,6 @@
 ../../pom.xml
   
 
-  org.apache.spark
   spark-network-shuffle_2.11
   jar
   Spark Project Shuffle Streaming Service

http://git-wip-us.apache.org/repos/asf/spark/blob/21a6dd2a/common/network-yarn/pom.xml
--
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index d43eb71..606ad15 100644
--- 

spark git commit: [MINOR][BUILD] Fix Java Linter `LineLength` errors

2016-07-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 6ee40d2cc -> 556a9437a


[MINOR][BUILD] Fix Java Linter `LineLength` errors

## What changes were proposed in this pull request?

This PR fixes four java linter `LineLength` errors. Those are all `LineLength` 
errors, but we had better remove all java linter errors before release.

## How was this patch tested?

After pass the Jenkins, `./dev/lint-java`.

Author: Dongjoon Hyun 

Closes #14255 from dongjoon-hyun/minor_java_linter.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/556a9437
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/556a9437
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/556a9437

Branch: refs/heads/master
Commit: 556a9437ac7b55079f5a8a91e669dcc36ca02696
Parents: 6ee40d2
Author: Dongjoon Hyun 
Authored: Tue Jul 19 11:51:43 2016 +0100
Committer: Sean Owen 
Committed: Tue Jul 19 11:51:43 2016 +0100

--
 .../spark/network/shuffle/ExternalShuffleBlockHandler.java | 3 ++-
 .../java/org/apache/spark/network/yarn/YarnShuffleService.java | 2 +-
 .../apache/spark/examples/sql/JavaSQLDataSourceExample.java| 6 --
 3 files changed, 7 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/556a9437/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java
--
diff --git 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java
 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java
index 1cc0fb6..1270cef 100644
--- 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java
+++ 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java
@@ -113,7 +113,8 @@ public class ExternalShuffleBlockHandler extends RpcHandler 
{
   }
 
 } else if (msgObj instanceof RegisterExecutor) {
-  final Timer.Context responseDelayContext = 
metrics.registerExecutorRequestLatencyMillis.time();
+  final Timer.Context responseDelayContext =
+metrics.registerExecutorRequestLatencyMillis.time();
   try {
 RegisterExecutor msg = (RegisterExecutor) msgObj;
 checkAuth(client, msg.appId);

http://git-wip-us.apache.org/repos/asf/spark/blob/556a9437/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
--
diff --git 
a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
 
b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
index df17dac..22e47ac 100644
--- 
a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
+++ 
b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
@@ -131,7 +131,7 @@ public class YarnShuffleService extends AuxiliaryService {
 
 try {
   // In case this NM was killed while there were running spark 
applications, we need to restore
-  // lost state for the existing executors.  We look for an existing file 
in the NM's local dirs.
+  // lost state for the existing executors. We look for an existing file 
in the NM's local dirs.
   // If we don't find one, then we choose a file to use to save the state 
next time.  Even if
   // an application was stopped while the NM was down, we expect yarn to 
call stopApplication()
   // when it comes back

http://git-wip-us.apache.org/repos/asf/spark/blob/556a9437/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
 
b/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
index 2b94b9f..ec02c8b 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
@@ -110,11 +110,13 @@ public class JavaSQLDataSourceExample {
 usersDF.select("name", 
"favorite_color").write().save("namesAndFavColors.parquet");
 // $example off:generic_load_save_functions$
 // $example on:manual_load_options$
-Dataset peopleDF = 
spark.read().format("json").load("examples/src/main/resources/people.json");
+Dataset peopleDF =
+  
spark.read().format("json").load("examples/src/main/resources/people.json");
 peopleDF.select("name", 

spark git commit: [DOC] improve python doc for rdd.histogram and dataframe.join

2016-07-19 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 1426a0805 -> 6ee40d2cc


[DOC] improve python doc for rdd.histogram and dataframe.join

## What changes were proposed in this pull request?

doc change only

## How was this patch tested?

doc change only

Author: Mortada Mehyar 

Closes #14253 from mortada/histogram_typos.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6ee40d2c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6ee40d2c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6ee40d2c

Branch: refs/heads/master
Commit: 6ee40d2cc5f467c78be662c1639fc3d5b7f796cf
Parents: 1426a08
Author: Mortada Mehyar 
Authored: Mon Jul 18 23:49:47 2016 -0700
Committer: Reynold Xin 
Committed: Mon Jul 18 23:49:47 2016 -0700

--
 python/pyspark/rdd.py   | 18 +-
 python/pyspark/sql/dataframe.py | 10 +-
 2 files changed, 14 insertions(+), 14 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6ee40d2c/python/pyspark/rdd.py
--
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index 6afe769..0508235 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -1027,20 +1027,20 @@ class RDD(object):
 
 If your histogram is evenly spaced (e.g. [0, 10, 20, 30]),
 this can be switched from an O(log n) inseration to O(1) per
-element(where n = # buckets).
+element (where n is the number of buckets).
 
-Buckets must be sorted and not contain any duplicates, must be
+Buckets must be sorted, not contain any duplicates, and have
 at least two elements.
 
-If `buckets` is a number, it will generates buckets which are
+If `buckets` is a number, it will generate buckets which are
 evenly spaced between the minimum and maximum of the RDD. For
-example, if the min value is 0 and the max is 100, given buckets
-as 2, the resulting buckets will be [0,50) [50,100]. buckets must
-be at least 1 If the RDD contains infinity, NaN throws an exception
-If the elements in RDD do not vary (max == min) always returns
-a single bucket.
+example, if the min value is 0 and the max is 100, given `buckets`
+as 2, the resulting buckets will be [0,50) [50,100]. `buckets` must
+be at least 1. An exception is raised if the RDD contains infinity.
+If the elements in the RDD do not vary (max == min), a single bucket
+will be used.
 
-It will return a tuple of buckets and histogram.
+The return value is a tuple of buckets and histogram.
 
 >>> rdd = sc.parallelize(range(51))
 >>> rdd.histogram(2)

http://git-wip-us.apache.org/repos/asf/spark/blob/6ee40d2c/python/pyspark/sql/dataframe.py
--
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index adf549d..8ff9403 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -613,16 +613,16 @@ class DataFrame(object):
 def join(self, other, on=None, how=None):
 """Joins with another :class:`DataFrame`, using the given join 
expression.
 
-The following performs a full outer join between ``df1`` and ``df2``.
-
 :param other: Right side of the join
-:param on: a string for join column name, a list of column names,
-, a join expression (Column) or a list of Columns.
-If `on` is a string or a list of string indicating the name of the 
join column(s),
+:param on: a string for the join column name, a list of column names,
+a join expression (Column), or a list of Columns.
+If `on` is a string or a list of strings indicating the name of 
the join column(s),
 the column(s) must exist on both sides, and this performs an 
equi-join.
 :param how: str, default 'inner'.
 One of `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`.
 
+The following performs a full outer join between ``df1`` and ``df2``.
+
 >>> df.join(df2, df.name == df2.name, 'outer').select(df.name, 
df2.height).collect()
 [Row(name=None, height=80), Row(name=u'Bob', height=85), 
Row(name=u'Alice', height=None)]
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [DOC] improve python doc for rdd.histogram and dataframe.join

2016-07-19 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 ef2a6f131 -> 504aa6f7a


[DOC] improve python doc for rdd.histogram and dataframe.join

## What changes were proposed in this pull request?

doc change only

## How was this patch tested?

doc change only

Author: Mortada Mehyar 

Closes #14253 from mortada/histogram_typos.

(cherry picked from commit 6ee40d2cc5f467c78be662c1639fc3d5b7f796cf)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/504aa6f7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/504aa6f7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/504aa6f7

Branch: refs/heads/branch-2.0
Commit: 504aa6f7a87973de0955aa8c124e2a036f8b3369
Parents: ef2a6f1
Author: Mortada Mehyar 
Authored: Mon Jul 18 23:49:47 2016 -0700
Committer: Reynold Xin 
Committed: Mon Jul 18 23:50:01 2016 -0700

--
 python/pyspark/rdd.py   | 18 +-
 python/pyspark/sql/dataframe.py | 10 +-
 2 files changed, 14 insertions(+), 14 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/504aa6f7/python/pyspark/rdd.py
--
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index 6afe769..0508235 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -1027,20 +1027,20 @@ class RDD(object):
 
 If your histogram is evenly spaced (e.g. [0, 10, 20, 30]),
 this can be switched from an O(log n) inseration to O(1) per
-element(where n = # buckets).
+element (where n is the number of buckets).
 
-Buckets must be sorted and not contain any duplicates, must be
+Buckets must be sorted, not contain any duplicates, and have
 at least two elements.
 
-If `buckets` is a number, it will generates buckets which are
+If `buckets` is a number, it will generate buckets which are
 evenly spaced between the minimum and maximum of the RDD. For
-example, if the min value is 0 and the max is 100, given buckets
-as 2, the resulting buckets will be [0,50) [50,100]. buckets must
-be at least 1 If the RDD contains infinity, NaN throws an exception
-If the elements in RDD do not vary (max == min) always returns
-a single bucket.
+example, if the min value is 0 and the max is 100, given `buckets`
+as 2, the resulting buckets will be [0,50) [50,100]. `buckets` must
+be at least 1. An exception is raised if the RDD contains infinity.
+If the elements in the RDD do not vary (max == min), a single bucket
+will be used.
 
-It will return a tuple of buckets and histogram.
+The return value is a tuple of buckets and histogram.
 
 >>> rdd = sc.parallelize(range(51))
 >>> rdd.histogram(2)

http://git-wip-us.apache.org/repos/asf/spark/blob/504aa6f7/python/pyspark/sql/dataframe.py
--
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index c7d704a..b9f50ff 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -601,16 +601,16 @@ class DataFrame(object):
 def join(self, other, on=None, how=None):
 """Joins with another :class:`DataFrame`, using the given join 
expression.
 
-The following performs a full outer join between ``df1`` and ``df2``.
-
 :param other: Right side of the join
-:param on: a string for join column name, a list of column names,
-, a join expression (Column) or a list of Columns.
-If `on` is a string or a list of string indicating the name of the 
join column(s),
+:param on: a string for the join column name, a list of column names,
+a join expression (Column), or a list of Columns.
+If `on` is a string or a list of strings indicating the name of 
the join column(s),
 the column(s) must exist on both sides, and this performs an 
equi-join.
 :param how: str, default 'inner'.
 One of `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`.
 
+The following performs a full outer join between ``df1`` and ``df2``.
+
 >>> df.join(df2, df.name == df2.name, 'outer').select(df.name, 
df2.height).collect()
 [Row(name=None, height=80), Row(name=u'Bob', height=85), 
Row(name=u'Alice', height=None)]
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java example update

2016-07-19 Thread yhuai
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 24ea87519 -> ef2a6f131


[SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java example update

## What changes were proposed in this pull request?

This PR moves one and the last hard-coded Scala example snippet from the SQL 
programming guide into `SparkSqlExample.scala`. It also renames all Scala/Java 
example files so that all "Sql" in the file names are updated to "SQL".

## How was this patch tested?

Manually verified the generated HTML page.

Author: Cheng Lian 

Closes #14245 from liancheng/minor-scala-example-update.

(cherry picked from commit 1426a080528bdb470b5e81300d892af45dd188bf)
Signed-off-by: Yin Huai 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ef2a6f13
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ef2a6f13
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ef2a6f13

Branch: refs/heads/branch-2.0
Commit: ef2a6f1310777bb6ea2b157a873c3785231b104a
Parents: 24ea875
Author: Cheng Lian 
Authored: Mon Jul 18 23:07:59 2016 -0700
Committer: Yin Huai 
Committed: Mon Jul 18 23:08:11 2016 -0700

--
 docs/sql-programming-guide.md   |  57 ++--
 .../examples/sql/JavaSQLDataSourceExample.java  | 217 
 .../spark/examples/sql/JavaSparkSQLExample.java | 336 +++
 .../spark/examples/sql/JavaSparkSqlExample.java | 336 ---
 .../examples/sql/JavaSqlDataSourceExample.java  | 217 
 .../examples/sql/SQLDataSourceExample.scala | 148 
 .../spark/examples/sql/SparkSQLExample.scala| 254 ++
 .../spark/examples/sql/SparkSqlExample.scala| 254 --
 .../examples/sql/SqlDataSourceExample.scala | 148 
 9 files changed, 983 insertions(+), 984 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ef2a6f13/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index a4127da..a88efb7 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -65,14 +65,14 @@ Throughout this document, we will often refer to Scala/Java 
Datasets of `Row`s a
 
 The entry point into all functionality in Spark is the 
[`SparkSession`](api/scala/index.html#org.apache.spark.sql.SparkSession) class. 
To create a basic `SparkSession`, just use `SparkSession.builder()`:
 
-{% include_example init_session 
scala/org/apache/spark/examples/sql/SparkSqlExample.scala %}
+{% include_example init_session 
scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
 
 
 
 
 The entry point into all functionality in Spark is the 
[`SparkSession`](api/java/index.html#org.apache.spark.sql.SparkSession) class. 
To create a basic `SparkSession`, just use `SparkSession.builder()`:
 
-{% include_example init_session 
java/org/apache/spark/examples/sql/JavaSparkSqlExample.java %}
+{% include_example init_session 
java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %}
 
 
 
@@ -105,7 +105,7 @@ from a Hive table, or from [Spark data 
sources](#data-sources).
 
 As an example, the following creates a DataFrame based on the content of a 
JSON file:
 
-{% include_example create_df 
scala/org/apache/spark/examples/sql/SparkSqlExample.scala %}
+{% include_example create_df 
scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
 
 
 
@@ -114,7 +114,7 @@ from a Hive table, or from [Spark data 
sources](#data-sources).
 
 As an example, the following creates a DataFrame based on the content of a 
JSON file:
 
-{% include_example create_df 
java/org/apache/spark/examples/sql/JavaSparkSqlExample.java %}
+{% include_example create_df 
java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %}
 
 
 
@@ -155,7 +155,7 @@ Here we include some basic examples of structured data 
processing using Datasets
 
 
 
-{% include_example untyped_ops 
scala/org/apache/spark/examples/sql/SparkSqlExample.scala %}
+{% include_example untyped_ops 
scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
 
 For a complete list of the types of operations that can be performed on a 
Dataset refer to the [API 
Documentation](api/scala/index.html#org.apache.spark.sql.Dataset).
 
@@ -164,7 +164,7 @@ In addition to simple column references and expressions, 
Datasets also have a ri
 
 
 
-{% include_example untyped_ops 
java/org/apache/spark/examples/sql/JavaSparkSqlExample.java %}
+{% include_example untyped_ops 
java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %}
 
 For a complete list of the types of operations that can be performed on a 
Dataset refer to the [API 
Documentation](api/java/org/apache/spark/sql/Dataset.html).
 
@@ 

spark git commit: [SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java example update

2016-07-19 Thread yhuai
Repository: spark
Updated Branches:
  refs/heads/master e5fbb182c -> 1426a0805


[SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java example update

## What changes were proposed in this pull request?

This PR moves one and the last hard-coded Scala example snippet from the SQL 
programming guide into `SparkSqlExample.scala`. It also renames all Scala/Java 
example files so that all "Sql" in the file names are updated to "SQL".

## How was this patch tested?

Manually verified the generated HTML page.

Author: Cheng Lian 

Closes #14245 from liancheng/minor-scala-example-update.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1426a080
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1426a080
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1426a080

Branch: refs/heads/master
Commit: 1426a080528bdb470b5e81300d892af45dd188bf
Parents: e5fbb18
Author: Cheng Lian 
Authored: Mon Jul 18 23:07:59 2016 -0700
Committer: Yin Huai 
Committed: Mon Jul 18 23:07:59 2016 -0700

--
 docs/sql-programming-guide.md   |  57 ++--
 .../examples/sql/JavaSQLDataSourceExample.java  | 217 
 .../spark/examples/sql/JavaSparkSQLExample.java | 336 +++
 .../spark/examples/sql/JavaSparkSqlExample.java | 336 ---
 .../examples/sql/JavaSqlDataSourceExample.java  | 217 
 .../examples/sql/SQLDataSourceExample.scala | 148 
 .../spark/examples/sql/SparkSQLExample.scala| 254 ++
 .../spark/examples/sql/SparkSqlExample.scala| 254 --
 .../examples/sql/SqlDataSourceExample.scala | 148 
 9 files changed, 983 insertions(+), 984 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1426a080/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 4413fdd..71f3ee4 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -65,14 +65,14 @@ Throughout this document, we will often refer to Scala/Java 
Datasets of `Row`s a
 
 The entry point into all functionality in Spark is the 
[`SparkSession`](api/scala/index.html#org.apache.spark.sql.SparkSession) class. 
To create a basic `SparkSession`, just use `SparkSession.builder()`:
 
-{% include_example init_session 
scala/org/apache/spark/examples/sql/SparkSqlExample.scala %}
+{% include_example init_session 
scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
 
 
 
 
 The entry point into all functionality in Spark is the 
[`SparkSession`](api/java/index.html#org.apache.spark.sql.SparkSession) class. 
To create a basic `SparkSession`, just use `SparkSession.builder()`:
 
-{% include_example init_session 
java/org/apache/spark/examples/sql/JavaSparkSqlExample.java %}
+{% include_example init_session 
java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %}
 
 
 
@@ -105,7 +105,7 @@ from a Hive table, or from [Spark data 
sources](#data-sources).
 
 As an example, the following creates a DataFrame based on the content of a 
JSON file:
 
-{% include_example create_df 
scala/org/apache/spark/examples/sql/SparkSqlExample.scala %}
+{% include_example create_df 
scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
 
 
 
@@ -114,7 +114,7 @@ from a Hive table, or from [Spark data 
sources](#data-sources).
 
 As an example, the following creates a DataFrame based on the content of a 
JSON file:
 
-{% include_example create_df 
java/org/apache/spark/examples/sql/JavaSparkSqlExample.java %}
+{% include_example create_df 
java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %}
 
 
 
@@ -155,7 +155,7 @@ Here we include some basic examples of structured data 
processing using Datasets
 
 
 
-{% include_example untyped_ops 
scala/org/apache/spark/examples/sql/SparkSqlExample.scala %}
+{% include_example untyped_ops 
scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
 
 For a complete list of the types of operations that can be performed on a 
Dataset refer to the [API 
Documentation](api/scala/index.html#org.apache.spark.sql.Dataset).
 
@@ -164,7 +164,7 @@ In addition to simple column references and expressions, 
Datasets also have a ri
 
 
 
-{% include_example untyped_ops 
java/org/apache/spark/examples/sql/JavaSparkSqlExample.java %}
+{% include_example untyped_ops 
java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %}
 
 For a complete list of the types of operations that can be performed on a 
Dataset refer to the [API 
Documentation](api/java/org/apache/spark/sql/Dataset.html).
 
@@ -249,13 +249,13 @@ In addition to simple column references and expressions, 
DataFrames also have a
 
 The `sql` function on a