spark git commit: [SPARK-15840][SQL] Add two missing options in documentation and some option related changes

2016-06-11 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 ffbc6b796 -> d494a483a


[SPARK-15840][SQL] Add two missing options in documentation and some option 
related changes

## What changes were proposed in this pull request?

This PR

1. Adds the documentations for some missing options, `inferSchema` and 
`mergeSchema` for Python and Scala.

2. Fiixes `[[DataFrame]]` to ```:class:`DataFrame` ``` so that this can be shown

  - from
![2016-06-09 9 31 
16](https://cloud.githubusercontent.com/assets/6477701/15929721/8b864734-2e89-11e6-83f6-207527de4ac9.png)

  - to (with class link)
![2016-06-09 9 31 
00](https://cloud.githubusercontent.com/assets/6477701/15929717/8a03d728-2e89-11e6-8a3f-08294964db22.png)

  (Please refer [the latest 
documentation](https://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/api/python/pyspark.sql.html))

3. Moves `mergeSchema` option to `ParquetOptions` with removing unused options, 
`metastoreSchema` and `metastoreTableName`.

  They are not used anymore. They were removed in 
https://github.com/apache/spark/commit/e720dda42e806229ccfd970055c7b8a93eb447bf 
and there are no use cases as below:

  ```bash
  grep -r -e METASTORE_SCHEMA -e \"metastoreSchema\" -e \"metastoreTableName\" 
-e METASTORE_TABLE_NAME .
  ```

  ```
  
./sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:
  private[sql] val METASTORE_SCHEMA = "metastoreSchema"
  
./sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:
  private[sql] val METASTORE_TABLE_NAME = "metastoreTableName"
  
./sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala: 
   ParquetFileFormat.METASTORE_TABLE_NAME -> TableIdentifier(
```

  It only sets `metastoreTableName` in the last case but does not use the table 
name.

4. Sets the correct default values (in the documentation) for `compression` 
option for ORC(`snappy`, see 
[OrcOptions.scala#L33-L42](https://github.com/apache/spark/blob/3ded5bc4db2badc9ff49554e73421021d854306b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala#L33-L42))
 and Parquet(`the value specified in SQLConf`, see 
[ParquetOptions.scala#L38-L47](https://github.com/apache/spark/blob/3ded5bc4db2badc9ff49554e73421021d854306b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala#L38-L47))
 and `columnNameOfCorruptRecord` for JSON(`the value specified in SQLConf`, see 
[JsonFileFormat.scala#L53-L55](https://github.com/apache/spark/blob/4538443e276597530a27c6922e48503677b13956/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala#L53-L55)
 and 
[JsonFileFormat.scala#L105-L106](https://github.com/apache/spark/blob/4538443e276597530a27c6922e48503677b13956/sql/core/src/main/scala/org/apache/sp
 ark/sql/execution/datasources/json/JsonFileFormat.scala#L105-L106)).

## How was this patch tested?

Existing tests should cover this.

Author: hyukjinkwon 
Author: Hyukjin Kwon 

Closes #13576 from HyukjinKwon/SPARK-15840.

(cherry picked from commit 9e204c62c6800e03759e04ef68268105d4b86bf2)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d494a483
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d494a483
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d494a483

Branch: refs/heads/branch-2.0
Commit: d494a483aef49766edf9c148dadb5e0c7351ca0d
Parents: ffbc6b7
Author: hyukjinkwon 
Authored: Sat Jun 11 23:20:40 2016 -0700
Committer: Reynold Xin 
Committed: Sat Jun 11 23:20:45 2016 -0700

--
 python/pyspark/sql/readwriter.py| 40 +---
 .../org/apache/spark/sql/DataFrameReader.scala  | 18 ++---
 .../org/apache/spark/sql/DataFrameWriter.scala  | 11 +++---
 .../datasources/parquet/ParquetFileFormat.scala | 19 ++
 .../datasources/parquet/ParquetOptions.scala| 15 +++-
 .../spark/sql/hive/HiveMetastoreCatalog.scala   | 12 ++
 6 files changed, 65 insertions(+), 50 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d494a483/python/pyspark/sql/readwriter.py
--
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index 7d1f186..f3182b2 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -209,7 +209,8 @@ class DataFrameReader(object):
 :param columnNameOfCorruptRecord: allows renaming the new field having 
malformed string
   created by ``PERMISSIVE`` mode. This 
overrides
   
``spark.sql.columnNameOfCorruptRecord``. If None is set,
-   

spark git commit: [SPARK-15840][SQL] Add two missing options in documentation and some option related changes

2016-06-11 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master e1f986c7a -> 9e204c62c


[SPARK-15840][SQL] Add two missing options in documentation and some option 
related changes

## What changes were proposed in this pull request?

This PR

1. Adds the documentations for some missing options, `inferSchema` and 
`mergeSchema` for Python and Scala.

2. Fiixes `[[DataFrame]]` to ```:class:`DataFrame` ``` so that this can be shown

  - from
![2016-06-09 9 31 
16](https://cloud.githubusercontent.com/assets/6477701/15929721/8b864734-2e89-11e6-83f6-207527de4ac9.png)

  - to (with class link)
![2016-06-09 9 31 
00](https://cloud.githubusercontent.com/assets/6477701/15929717/8a03d728-2e89-11e6-8a3f-08294964db22.png)

  (Please refer [the latest 
documentation](https://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/api/python/pyspark.sql.html))

3. Moves `mergeSchema` option to `ParquetOptions` with removing unused options, 
`metastoreSchema` and `metastoreTableName`.

  They are not used anymore. They were removed in 
https://github.com/apache/spark/commit/e720dda42e806229ccfd970055c7b8a93eb447bf 
and there are no use cases as below:

  ```bash
  grep -r -e METASTORE_SCHEMA -e \"metastoreSchema\" -e \"metastoreTableName\" 
-e METASTORE_TABLE_NAME .
  ```

  ```
  
./sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:
  private[sql] val METASTORE_SCHEMA = "metastoreSchema"
  
./sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:
  private[sql] val METASTORE_TABLE_NAME = "metastoreTableName"
  
./sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala: 
   ParquetFileFormat.METASTORE_TABLE_NAME -> TableIdentifier(
```

  It only sets `metastoreTableName` in the last case but does not use the table 
name.

4. Sets the correct default values (in the documentation) for `compression` 
option for ORC(`snappy`, see 
[OrcOptions.scala#L33-L42](https://github.com/apache/spark/blob/3ded5bc4db2badc9ff49554e73421021d854306b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala#L33-L42))
 and Parquet(`the value specified in SQLConf`, see 
[ParquetOptions.scala#L38-L47](https://github.com/apache/spark/blob/3ded5bc4db2badc9ff49554e73421021d854306b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala#L38-L47))
 and `columnNameOfCorruptRecord` for JSON(`the value specified in SQLConf`, see 
[JsonFileFormat.scala#L53-L55](https://github.com/apache/spark/blob/4538443e276597530a27c6922e48503677b13956/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala#L53-L55)
 and 
[JsonFileFormat.scala#L105-L106](https://github.com/apache/spark/blob/4538443e276597530a27c6922e48503677b13956/sql/core/src/main/scala/org/apache/sp
 ark/sql/execution/datasources/json/JsonFileFormat.scala#L105-L106)).

## How was this patch tested?

Existing tests should cover this.

Author: hyukjinkwon 
Author: Hyukjin Kwon 

Closes #13576 from HyukjinKwon/SPARK-15840.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9e204c62
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9e204c62
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9e204c62

Branch: refs/heads/master
Commit: 9e204c62c6800e03759e04ef68268105d4b86bf2
Parents: e1f986c
Author: hyukjinkwon 
Authored: Sat Jun 11 23:20:40 2016 -0700
Committer: Reynold Xin 
Committed: Sat Jun 11 23:20:40 2016 -0700

--
 python/pyspark/sql/readwriter.py| 40 +---
 .../org/apache/spark/sql/DataFrameReader.scala  | 18 ++---
 .../org/apache/spark/sql/DataFrameWriter.scala  | 11 +++---
 .../datasources/parquet/ParquetFileFormat.scala | 19 ++
 .../datasources/parquet/ParquetOptions.scala| 15 +++-
 .../spark/sql/hive/HiveMetastoreCatalog.scala   | 12 ++
 6 files changed, 65 insertions(+), 50 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9e204c62/python/pyspark/sql/readwriter.py
--
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index 7d1f186..f3182b2 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -209,7 +209,8 @@ class DataFrameReader(object):
 :param columnNameOfCorruptRecord: allows renaming the new field having 
malformed string
   created by ``PERMISSIVE`` mode. This 
overrides
   
``spark.sql.columnNameOfCorruptRecord``. If None is set,
-  it uses the default value 
``_corrupt_record``.
+  it u

spark git commit: [SPARK-15860] Metrics for codegen size and perf

2016-06-11 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 796dd1514 -> ffbc6b796


[SPARK-15860] Metrics for codegen size and perf

## What changes were proposed in this pull request?

Adds codahale metrics for the codegen source text size and how long it takes to 
compile. The size is particularly interesting, since the JVM does have hard 
limits on how large methods can get.

To simplify, I added the metrics under a statically-initialized source that is 
always registered with SparkEnv.

## How was this patch tested?

Unit tests

Author: Eric Liang 

Closes #13586 from ericl/spark-15860.

(cherry picked from commit e1f986c7a3fcc3864d53ef99ef7f14fa4d262ac3)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ffbc6b79
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ffbc6b79
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ffbc6b79

Branch: refs/heads/branch-2.0
Commit: ffbc6b796591d3e1f3dcb950335871b7826e6b3b
Parents: 796dd15
Author: Eric Liang 
Authored: Sat Jun 11 23:16:21 2016 -0700
Committer: Reynold Xin 
Committed: Sat Jun 11 23:16:28 2016 -0700

--
 .../apache/spark/metrics/MetricsSystem.scala|  3 +-
 .../spark/metrics/source/StaticSources.scala| 50 
 .../spark/metrics/MetricsSystemSuite.scala  |  8 ++--
 .../expressions/codegen/CodeGenerator.scala |  3 ++
 .../expressions/CodeGenerationSuite.scala   |  9 
 5 files changed, 68 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ffbc6b79/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala
--
diff --git a/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala 
b/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala
index 0fed991..9b16c11 100644
--- a/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala
+++ b/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala
@@ -28,7 +28,7 @@ import org.eclipse.jetty.servlet.ServletContextHandler
 import org.apache.spark.{SecurityManager, SparkConf}
 import org.apache.spark.internal.Logging
 import org.apache.spark.metrics.sink.{MetricsServlet, Sink}
-import org.apache.spark.metrics.source.Source
+import org.apache.spark.metrics.source.{Source, StaticSources}
 import org.apache.spark.util.Utils
 
 /**
@@ -96,6 +96,7 @@ private[spark] class MetricsSystem private (
   def start() {
 require(!running, "Attempting to start a MetricsSystem that is already 
running")
 running = true
+StaticSources.allSources.foreach(registerSource)
 registerSources()
 registerSinks()
 sinks.foreach(_.start)

http://git-wip-us.apache.org/repos/asf/spark/blob/ffbc6b79/core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala 
b/core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala
new file mode 100644
index 000..6819222
--- /dev/null
+++ b/core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.metrics.source
+
+import com.codahale.metrics.MetricRegistry
+
+import org.apache.spark.annotation.Experimental
+
+private[spark] object StaticSources {
+  /**
+   * The set of all static sources. These sources may be reported to from any 
class, including
+   * static classes, without requiring reference to a SparkEnv.
+   */
+  val allSources = Seq(CodegenMetrics)
+}
+
+/**
+ * :: Experimental ::
+ * Metrics for code generation.
+ */
+@Experimental
+object CodegenMetrics extends Source {
+  override val sourceName: String = "CodeGenerator"
+  override val metricRegistry: MetricRegistry = new MetricRegistry()
+
+  /**
+   * Histogram of the length of source code text compiled by CodeGenerator (in 
characters).
+   */
+  val METRIC_S

spark git commit: [SPARK-15860] Metrics for codegen size and perf

2016-06-11 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 3fd2ff4dd -> e1f986c7a


[SPARK-15860] Metrics for codegen size and perf

## What changes were proposed in this pull request?

Adds codahale metrics for the codegen source text size and how long it takes to 
compile. The size is particularly interesting, since the JVM does have hard 
limits on how large methods can get.

To simplify, I added the metrics under a statically-initialized source that is 
always registered with SparkEnv.

## How was this patch tested?

Unit tests

Author: Eric Liang 

Closes #13586 from ericl/spark-15860.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e1f986c7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e1f986c7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e1f986c7

Branch: refs/heads/master
Commit: e1f986c7a3fcc3864d53ef99ef7f14fa4d262ac3
Parents: 3fd2ff4
Author: Eric Liang 
Authored: Sat Jun 11 23:16:21 2016 -0700
Committer: Reynold Xin 
Committed: Sat Jun 11 23:16:21 2016 -0700

--
 .../apache/spark/metrics/MetricsSystem.scala|  3 +-
 .../spark/metrics/source/StaticSources.scala| 50 
 .../spark/metrics/MetricsSystemSuite.scala  |  8 ++--
 .../expressions/codegen/CodeGenerator.scala |  3 ++
 .../expressions/CodeGenerationSuite.scala   |  9 
 5 files changed, 68 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e1f986c7/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala
--
diff --git a/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala 
b/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala
index 0fed991..9b16c11 100644
--- a/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala
+++ b/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala
@@ -28,7 +28,7 @@ import org.eclipse.jetty.servlet.ServletContextHandler
 import org.apache.spark.{SecurityManager, SparkConf}
 import org.apache.spark.internal.Logging
 import org.apache.spark.metrics.sink.{MetricsServlet, Sink}
-import org.apache.spark.metrics.source.Source
+import org.apache.spark.metrics.source.{Source, StaticSources}
 import org.apache.spark.util.Utils
 
 /**
@@ -96,6 +96,7 @@ private[spark] class MetricsSystem private (
   def start() {
 require(!running, "Attempting to start a MetricsSystem that is already 
running")
 running = true
+StaticSources.allSources.foreach(registerSource)
 registerSources()
 registerSinks()
 sinks.foreach(_.start)

http://git-wip-us.apache.org/repos/asf/spark/blob/e1f986c7/core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala 
b/core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala
new file mode 100644
index 000..6819222
--- /dev/null
+++ b/core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.metrics.source
+
+import com.codahale.metrics.MetricRegistry
+
+import org.apache.spark.annotation.Experimental
+
+private[spark] object StaticSources {
+  /**
+   * The set of all static sources. These sources may be reported to from any 
class, including
+   * static classes, without requiring reference to a SparkEnv.
+   */
+  val allSources = Seq(CodegenMetrics)
+}
+
+/**
+ * :: Experimental ::
+ * Metrics for code generation.
+ */
+@Experimental
+object CodegenMetrics extends Source {
+  override val sourceName: String = "CodeGenerator"
+  override val metricRegistry: MetricRegistry = new MetricRegistry()
+
+  /**
+   * Histogram of the length of source code text compiled by CodeGenerator (in 
characters).
+   */
+  val METRIC_SOURCE_CODE_SIZE = 
metricRegistry.histogram(MetricRegistry.name("sourceCodeSize"))
+
+  /**
+   * Histogra

spark git commit: Revert "[SPARK-14851][CORE] Support radix sort with nullable longs"

2016-06-11 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 7e2bfff20 -> 796dd1514


Revert "[SPARK-14851][CORE] Support radix sort with nullable longs"

This reverts commit beb75300455a4f92000b69e740256102d9f2d472.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/796dd151
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/796dd151
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/796dd151

Branch: refs/heads/branch-2.0
Commit: 796dd15142c00e96d2d7180f7909055a3eb1dfdf
Parents: 7e2bfff
Author: Reynold Xin 
Authored: Sat Jun 11 15:49:39 2016 -0700
Committer: Reynold Xin 
Committed: Sat Jun 11 15:49:39 2016 -0700

--
 .../util/collection/unsafe/sort/RadixSort.java  | 24 -
 .../unsafe/sort/UnsafeExternalSorter.java   | 11 ++--
 .../unsafe/sort/UnsafeInMemorySorter.java   | 56 
 .../unsafe/sort/UnsafeExternalSorterSuite.java  | 26 -
 .../unsafe/sort/UnsafeInMemorySorterSuite.java  |  2 +-
 .../collection/unsafe/sort/RadixSortSuite.scala |  4 +-
 .../sql/execution/UnsafeExternalRowSorter.java  | 20 ++-
 .../sql/catalyst/expressions/SortOrder.scala| 40 ++
 .../sql/execution/UnsafeKVExternalSorter.java   | 11 ++--
 .../apache/spark/sql/execution/SortExec.scala   | 12 ++---
 .../spark/sql/execution/SortPrefixUtils.scala   | 32 ---
 .../apache/spark/sql/execution/WindowExec.scala |  4 +-
 .../execution/joins/CartesianProductExec.scala  |  2 +-
 .../apache/spark/sql/execution/SortSuite.scala  | 11 
 .../sql/execution/benchmark/SortBenchmark.scala |  2 +-
 15 files changed, 79 insertions(+), 178 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/796dd151/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
--
diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
index 4043617..4f3f0de 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
@@ -170,13 +170,9 @@ public class RadixSort {
   /**
* Specialization of sort() for key-prefix arrays. In this type of array, 
each record consists
* of two longs, only the second of which is sorted on.
-   *
-   * @param startIndex starting index in the array to sort from. This 
parameter is not supported
-   *in the plain sort() implementation.
*/
   public static int sortKeyPrefixArray(
   LongArray array,
-  int startIndex,
   int numRecords,
   int startByteIndex,
   int endByteIndex,
@@ -186,11 +182,10 @@ public class RadixSort {
 assert endByteIndex <= 7 : "endByteIndex (" + endByteIndex + ") should <= 
7";
 assert endByteIndex > startByteIndex;
 assert numRecords * 4 <= array.size();
-int inIndex = startIndex;
-int outIndex = startIndex + numRecords * 2;
+int inIndex = 0;
+int outIndex = numRecords * 2;
 if (numRecords > 0) {
-  long[][] counts = getKeyPrefixArrayCounts(
-array, startIndex, numRecords, startByteIndex, endByteIndex);
+  long[][] counts = getKeyPrefixArrayCounts(array, numRecords, 
startByteIndex, endByteIndex);
   for (int i = startByteIndex; i <= endByteIndex; i++) {
 if (counts[i] != null) {
   sortKeyPrefixArrayAtByte(
@@ -210,14 +205,13 @@ public class RadixSort {
* getCounts with some added parameters but that seems to hurt in benchmarks.
*/
   private static long[][] getKeyPrefixArrayCounts(
-  LongArray array, int startIndex, int numRecords, int startByteIndex, int 
endByteIndex) {
+  LongArray array, int numRecords, int startByteIndex, int endByteIndex) {
 long[][] counts = new long[8][];
 long bitwiseMax = 0;
 long bitwiseMin = -1L;
-long baseOffset = array.getBaseOffset() + startIndex * 8L;
-long limit = baseOffset + numRecords * 16L;
+long limit = array.getBaseOffset() + numRecords * 16;
 Object baseObject = array.getBaseObject();
-for (long offset = baseOffset; offset < limit; offset += 16) {
+for (long offset = array.getBaseOffset(); offset < limit; offset += 16) {
   long value = Platform.getLong(baseObject, offset + 8);
   bitwiseMax |= value;
   bitwiseMin &= value;
@@ -226,7 +220,7 @@ public class RadixSort {
 for (int i = startByteIndex; i <= endByteIndex; i++) {
   if (((bitsChanged >>> (i * 8)) & 0xff) != 0) {
 counts[i] = new long[256];
-for (long offset = baseOffset; offset < limit; offset += 16) {
+for (long offset = array.getBaseOffset(); offset < limit; offset += 
16) {
   counts[i][(int)((Pl

spark git commit: [SPARK-15807][SQL] Support varargs for dropDuplicates in Dataset/DataFrame

2016-06-11 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 beb753004 -> 7e2bfff20


[SPARK-15807][SQL] Support varargs for dropDuplicates in Dataset/DataFrame

## What changes were proposed in this pull request?
This PR adds `varargs`-types `dropDuplicates` functions in `Dataset/DataFrame`. 
Currently, `dropDuplicates` supports only `Seq` or `Array`.

**Before**
```scala
scala> val ds = spark.createDataFrame(Seq(("a", 1), ("b", 2), ("a", 2)))
ds: org.apache.spark.sql.DataFrame = [_1: string, _2: int]

scala> ds.dropDuplicates(Seq("_1", "_2"))
res0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: string, _2: 
int]

scala> ds.dropDuplicates("_1", "_2")
:26: error: overloaded method value dropDuplicates with alternatives:
  (colNames: 
Array[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] 
  (colNames: Seq[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] 

  ()org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
 cannot be applied to (String, String)
   ds.dropDuplicates("_1", "_2")
  ^
```

**After**
```scala
scala> val ds = spark.createDataFrame(Seq(("a", 1), ("b", 2), ("a", 2)))
ds: org.apache.spark.sql.DataFrame = [_1: string, _2: int]

scala> ds.dropDuplicates("_1", "_2")
res0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: string, _2: 
int]
```

## How was this patch tested?

Pass the Jenkins tests with new testcases.

Author: Dongjoon Hyun 

Closes #13545 from dongjoon-hyun/SPARK-15807.

(cherry picked from commit 3fd2ff4dd85633af49865456a52bf0c09c99708b)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7e2bfff2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7e2bfff2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7e2bfff2

Branch: refs/heads/branch-2.0
Commit: 7e2bfff20c7278a20dca857cfd452b96d4d97c1a
Parents: beb7530
Author: Dongjoon Hyun 
Authored: Sat Jun 11 15:47:51 2016 -0700
Committer: Reynold Xin 
Committed: Sat Jun 11 15:47:57 2016 -0700

--
 .../src/main/scala/org/apache/spark/sql/Dataset.scala  | 13 +
 .../scala/org/apache/spark/sql/DataFrameSuite.scala|  4 
 .../test/scala/org/apache/spark/sql/DatasetSuite.scala | 13 +
 3 files changed, 30 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7e2bfff2/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index 16bbf30..5a67fc7 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1834,6 +1834,19 @@ class Dataset[T] private[sql](
   def dropDuplicates(colNames: Array[String]): Dataset[T] = 
dropDuplicates(colNames.toSeq)
 
   /**
+   * Returns a new [[Dataset]] with duplicate rows removed, considering only
+   * the subset of columns.
+   *
+   * @group typedrel
+   * @since 2.0.0
+   */
+  @scala.annotation.varargs
+  def dropDuplicates(col1: String, cols: String*): Dataset[T] = {
+val colNames: Seq[String] = col1 +: cols
+dropDuplicates(colNames)
+  }
+
+  /**
* Computes statistics for numeric columns, including count, mean, stddev, 
min, and max.
* If no columns are given, this function computes statistics for all 
numerical columns.
*

http://git-wip-us.apache.org/repos/asf/spark/blob/7e2bfff2/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
--
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
index a02e48d..6bb0ce9 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
@@ -906,6 +906,10 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 checkAnswer(
   testData.dropDuplicates(Seq("value2")),
   Seq(Row(2, 1, 2), Row(1, 1, 1)))
+
+checkAnswer(
+  testData.dropDuplicates("key", "value1"),
+  Seq(Row(2, 1, 2), Row(1, 2, 1), Row(1, 1, 1), Row(2, 2, 2)))
   }
 
   test("SPARK-7150 range api") {

http://git-wip-us.apache.org/repos/asf/spark/blob/7e2bfff2/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
--
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
index 11b52bd..4536a73 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
+++ b/sql/core/src/test/sca

spark git commit: [SPARK-15807][SQL] Support varargs for dropDuplicates in Dataset/DataFrame

2016-06-11 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master c06c58bbb -> 3fd2ff4dd


[SPARK-15807][SQL] Support varargs for dropDuplicates in Dataset/DataFrame

## What changes were proposed in this pull request?
This PR adds `varargs`-types `dropDuplicates` functions in `Dataset/DataFrame`. 
Currently, `dropDuplicates` supports only `Seq` or `Array`.

**Before**
```scala
scala> val ds = spark.createDataFrame(Seq(("a", 1), ("b", 2), ("a", 2)))
ds: org.apache.spark.sql.DataFrame = [_1: string, _2: int]

scala> ds.dropDuplicates(Seq("_1", "_2"))
res0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: string, _2: 
int]

scala> ds.dropDuplicates("_1", "_2")
:26: error: overloaded method value dropDuplicates with alternatives:
  (colNames: 
Array[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] 
  (colNames: Seq[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] 

  ()org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
 cannot be applied to (String, String)
   ds.dropDuplicates("_1", "_2")
  ^
```

**After**
```scala
scala> val ds = spark.createDataFrame(Seq(("a", 1), ("b", 2), ("a", 2)))
ds: org.apache.spark.sql.DataFrame = [_1: string, _2: int]

scala> ds.dropDuplicates("_1", "_2")
res0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: string, _2: 
int]
```

## How was this patch tested?

Pass the Jenkins tests with new testcases.

Author: Dongjoon Hyun 

Closes #13545 from dongjoon-hyun/SPARK-15807.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3fd2ff4d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3fd2ff4d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3fd2ff4d

Branch: refs/heads/master
Commit: 3fd2ff4dd85633af49865456a52bf0c09c99708b
Parents: c06c58b
Author: Dongjoon Hyun 
Authored: Sat Jun 11 15:47:51 2016 -0700
Committer: Reynold Xin 
Committed: Sat Jun 11 15:47:51 2016 -0700

--
 .../src/main/scala/org/apache/spark/sql/Dataset.scala  | 13 +
 .../scala/org/apache/spark/sql/DataFrameSuite.scala|  4 
 .../test/scala/org/apache/spark/sql/DatasetSuite.scala | 13 +
 3 files changed, 30 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3fd2ff4d/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index 16bbf30..5a67fc7 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1834,6 +1834,19 @@ class Dataset[T] private[sql](
   def dropDuplicates(colNames: Array[String]): Dataset[T] = 
dropDuplicates(colNames.toSeq)
 
   /**
+   * Returns a new [[Dataset]] with duplicate rows removed, considering only
+   * the subset of columns.
+   *
+   * @group typedrel
+   * @since 2.0.0
+   */
+  @scala.annotation.varargs
+  def dropDuplicates(col1: String, cols: String*): Dataset[T] = {
+val colNames: Seq[String] = col1 +: cols
+dropDuplicates(colNames)
+  }
+
+  /**
* Computes statistics for numeric columns, including count, mean, stddev, 
min, and max.
* If no columns are given, this function computes statistics for all 
numerical columns.
*

http://git-wip-us.apache.org/repos/asf/spark/blob/3fd2ff4d/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
--
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
index a02e48d..6bb0ce9 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
@@ -906,6 +906,10 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 checkAnswer(
   testData.dropDuplicates(Seq("value2")),
   Seq(Row(2, 1, 2), Row(1, 1, 1)))
+
+checkAnswer(
+  testData.dropDuplicates("key", "value1"),
+  Seq(Row(2, 1, 2), Row(1, 2, 1), Row(1, 1, 1), Row(2, 2, 2)))
   }
 
   test("SPARK-7150 range api") {

http://git-wip-us.apache.org/repos/asf/spark/blob/3fd2ff4d/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
--
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
index 11b52bd..4536a73 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
@@ -806,6 +806,19 @@ class DatasetSuite extends QueryTest with 

spark git commit: [SPARK-14851][CORE] Support radix sort with nullable longs

2016-06-11 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 0cf31f0c8 -> beb753004


[SPARK-14851][CORE] Support radix sort with nullable longs

## What changes were proposed in this pull request?

This adds support for radix sort of nullable long fields. When a sort field is 
null and radix sort is enabled, we keep nulls in a separate region of the sort 
buffer so that radix sort does not need to deal with them. This also has 
performance benefits when sorting smaller integer types, since the current 
representation of nulls in two's complement (Long.MIN_VALUE) otherwise forces a 
full-width radix sort.

This strategy for nulls does mean the sort is no longer stable. cc davies

## How was this patch tested?

Existing randomized sort tests for correctness. I also tested some TPCDS 
queries and there does not seem to be any significant regression for non-null 
sorts.

Some test queries (best of 5 runs each).
Before change:
scala> val start = System.nanoTime; spark.range(500).selectExpr("if(id > 5, 
cast(hash(id) as long), NULL) as h").coalesce(1).orderBy("h").collect(); 
(System.nanoTime - start) / 1e6
start: Long = 3190437233227987
res3: Double = 4716.471091

After change:
scala> val start = System.nanoTime; spark.range(500).selectExpr("if(id > 5, 
cast(hash(id) as long), NULL) as h").coalesce(1).orderBy("h").collect(); 
(System.nanoTime - start) / 1e6
start: Long = 3190367870952791
res4: Double = 2981.143045

Author: Eric Liang 

Closes #13161 from ericl/sc-2998.

(cherry picked from commit c06c582de0c22cfc70c486d23a94c3079ba4)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/beb75300
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/beb75300
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/beb75300

Branch: refs/heads/branch-2.0
Commit: beb75300455a4f92000b69e740256102d9f2d472
Parents: 0cf31f0
Author: Eric Liang 
Authored: Sat Jun 11 15:42:58 2016 -0700
Committer: Reynold Xin 
Committed: Sat Jun 11 15:43:03 2016 -0700

--
 .../util/collection/unsafe/sort/RadixSort.java  | 24 +
 .../unsafe/sort/UnsafeExternalSorter.java   | 11 ++--
 .../unsafe/sort/UnsafeInMemorySorter.java   | 56 
 .../unsafe/sort/UnsafeExternalSorterSuite.java  | 26 -
 .../unsafe/sort/UnsafeInMemorySorterSuite.java  |  2 +-
 .../collection/unsafe/sort/RadixSortSuite.scala |  4 +-
 .../sql/execution/UnsafeExternalRowSorter.java  | 20 +--
 .../sql/catalyst/expressions/SortOrder.scala| 40 --
 .../sql/execution/UnsafeKVExternalSorter.java   | 11 ++--
 .../apache/spark/sql/execution/SortExec.scala   | 12 +++--
 .../spark/sql/execution/SortPrefixUtils.scala   | 32 +++
 .../apache/spark/sql/execution/WindowExec.scala |  4 +-
 .../execution/joins/CartesianProductExec.scala  |  2 +-
 .../apache/spark/sql/execution/SortSuite.scala  | 11 
 .../sql/execution/benchmark/SortBenchmark.scala |  2 +-
 15 files changed, 178 insertions(+), 79 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/beb75300/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
--
diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
index 4f3f0de..4043617 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
@@ -170,9 +170,13 @@ public class RadixSort {
   /**
* Specialization of sort() for key-prefix arrays. In this type of array, 
each record consists
* of two longs, only the second of which is sorted on.
+   *
+   * @param startIndex starting index in the array to sort from. This 
parameter is not supported
+   *in the plain sort() implementation.
*/
   public static int sortKeyPrefixArray(
   LongArray array,
+  int startIndex,
   int numRecords,
   int startByteIndex,
   int endByteIndex,
@@ -182,10 +186,11 @@ public class RadixSort {
 assert endByteIndex <= 7 : "endByteIndex (" + endByteIndex + ") should <= 
7";
 assert endByteIndex > startByteIndex;
 assert numRecords * 4 <= array.size();
-int inIndex = 0;
-int outIndex = numRecords * 2;
+int inIndex = startIndex;
+int outIndex = startIndex + numRecords * 2;
 if (numRecords > 0) {
-  long[][] counts = getKeyPrefixArrayCounts(array, numRecords, 
startByteIndex, endByteIndex);
+  long[][] counts = getKeyPrefixArrayCounts(
+array, startIndex, numRecords, startByteIndex, endByteIndex);
   for (int i = startByteIndex; i <= endByteIndex; i++) {
 if (co

spark git commit: [SPARK-14851][CORE] Support radix sort with nullable longs

2016-06-11 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 75705e8db -> c06c58bbb


[SPARK-14851][CORE] Support radix sort with nullable longs

## What changes were proposed in this pull request?

This adds support for radix sort of nullable long fields. When a sort field is 
null and radix sort is enabled, we keep nulls in a separate region of the sort 
buffer so that radix sort does not need to deal with them. This also has 
performance benefits when sorting smaller integer types, since the current 
representation of nulls in two's complement (Long.MIN_VALUE) otherwise forces a 
full-width radix sort.

This strategy for nulls does mean the sort is no longer stable. cc davies

## How was this patch tested?

Existing randomized sort tests for correctness. I also tested some TPCDS 
queries and there does not seem to be any significant regression for non-null 
sorts.

Some test queries (best of 5 runs each).
Before change:
scala> val start = System.nanoTime; spark.range(500).selectExpr("if(id > 5, 
cast(hash(id) as long), NULL) as h").coalesce(1).orderBy("h").collect(); 
(System.nanoTime - start) / 1e6
start: Long = 3190437233227987
res3: Double = 4716.471091

After change:
scala> val start = System.nanoTime; spark.range(500).selectExpr("if(id > 5, 
cast(hash(id) as long), NULL) as h").coalesce(1).orderBy("h").collect(); 
(System.nanoTime - start) / 1e6
start: Long = 3190367870952791
res4: Double = 2981.143045

Author: Eric Liang 

Closes #13161 from ericl/sc-2998.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c06c58bb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c06c58bb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c06c58bb

Branch: refs/heads/master
Commit: c06c582de0c22cfc70c486d23a94c3079ba4
Parents: 75705e8
Author: Eric Liang 
Authored: Sat Jun 11 15:42:58 2016 -0700
Committer: Reynold Xin 
Committed: Sat Jun 11 15:42:58 2016 -0700

--
 .../util/collection/unsafe/sort/RadixSort.java  | 24 +
 .../unsafe/sort/UnsafeExternalSorter.java   | 11 ++--
 .../unsafe/sort/UnsafeInMemorySorter.java   | 56 
 .../unsafe/sort/UnsafeExternalSorterSuite.java  | 26 -
 .../unsafe/sort/UnsafeInMemorySorterSuite.java  |  2 +-
 .../collection/unsafe/sort/RadixSortSuite.scala |  4 +-
 .../sql/execution/UnsafeExternalRowSorter.java  | 20 +--
 .../sql/catalyst/expressions/SortOrder.scala| 40 --
 .../sql/execution/UnsafeKVExternalSorter.java   | 11 ++--
 .../apache/spark/sql/execution/SortExec.scala   | 12 +++--
 .../spark/sql/execution/SortPrefixUtils.scala   | 32 +++
 .../apache/spark/sql/execution/WindowExec.scala |  4 +-
 .../execution/joins/CartesianProductExec.scala  |  2 +-
 .../apache/spark/sql/execution/SortSuite.scala  | 11 
 .../sql/execution/benchmark/SortBenchmark.scala |  2 +-
 15 files changed, 178 insertions(+), 79 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c06c58bb/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
--
diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
index 4f3f0de..4043617 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
@@ -170,9 +170,13 @@ public class RadixSort {
   /**
* Specialization of sort() for key-prefix arrays. In this type of array, 
each record consists
* of two longs, only the second of which is sorted on.
+   *
+   * @param startIndex starting index in the array to sort from. This 
parameter is not supported
+   *in the plain sort() implementation.
*/
   public static int sortKeyPrefixArray(
   LongArray array,
+  int startIndex,
   int numRecords,
   int startByteIndex,
   int endByteIndex,
@@ -182,10 +186,11 @@ public class RadixSort {
 assert endByteIndex <= 7 : "endByteIndex (" + endByteIndex + ") should <= 
7";
 assert endByteIndex > startByteIndex;
 assert numRecords * 4 <= array.size();
-int inIndex = 0;
-int outIndex = numRecords * 2;
+int inIndex = startIndex;
+int outIndex = startIndex + numRecords * 2;
 if (numRecords > 0) {
-  long[][] counts = getKeyPrefixArrayCounts(array, numRecords, 
startByteIndex, endByteIndex);
+  long[][] counts = getKeyPrefixArrayCounts(
+array, startIndex, numRecords, startByteIndex, endByteIndex);
   for (int i = startByteIndex; i <= endByteIndex; i++) {
 if (counts[i] != null) {
   sortKeyPrefixArrayAtByte(
@@ -205,13 +210,14 @@ public class RadixSort {
   

spark git commit: [SPARK-15856][SQL] Revert API breaking changes made in SQLContext.range

2016-06-11 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 304ec5de3 -> 0cf31f0c8


[SPARK-15856][SQL] Revert API breaking changes made in SQLContext.range

## What changes were proposed in this pull request?

It's easy for users to call `range(...).as[Long]` to get typed Dataset, and 
don't worth an API breaking change. This PR reverts it.

## How was this patch tested?

N/A

Author: Wenchen Fan 

Closes #13605 from cloud-fan/range.

(cherry picked from commit 75705e8dbb51ac91ffc7012fa67f072494c13832)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0cf31f0c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0cf31f0c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0cf31f0c

Branch: refs/heads/branch-2.0
Commit: 0cf31f0c8486ac3f8efca84bcfec75c2d0dd738a
Parents: 304ec5d
Author: Wenchen Fan 
Authored: Sat Jun 11 15:28:40 2016 -0700
Committer: Reynold Xin 
Committed: Sat Jun 11 15:28:45 2016 -0700

--
 .../scala/org/apache/spark/sql/SQLContext.scala | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0cf31f0c/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
index 23f2b6e..6fcc9bb 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
@@ -609,51 +609,51 @@ class SQLContext private[sql](val sparkSession: 
SparkSession)
 
   /**
* :: Experimental ::
-   * Creates a [[Dataset]] with a single [[LongType]] column named `id`, 
containing elements
+   * Creates a [[DataFrame]] with a single [[LongType]] column named `id`, 
containing elements
* in a range from 0 to `end` (exclusive) with step value 1.
*
-   * @since 2.0.0
-   * @group dataset
+   * @since 1.4.1
+   * @group dataframe
*/
   @Experimental
-  def range(end: Long): Dataset[java.lang.Long] = sparkSession.range(end)
+  def range(end: Long): DataFrame = sparkSession.range(end).toDF()
 
   /**
* :: Experimental ::
-   * Creates a [[Dataset]] with a single [[LongType]] column named `id`, 
containing elements
+   * Creates a [[DataFrame]] with a single [[LongType]] column named `id`, 
containing elements
* in a range from `start` to `end` (exclusive) with step value 1.
*
-   * @since 2.0.0
-   * @group dataset
+   * @since 1.4.0
+   * @group dataframe
*/
   @Experimental
-  def range(start: Long, end: Long): Dataset[java.lang.Long] = 
sparkSession.range(start, end)
+  def range(start: Long, end: Long): DataFrame = sparkSession.range(start, 
end).toDF()
 
   /**
* :: Experimental ::
-   * Creates a [[Dataset]] with a single [[LongType]] column named `id`, 
containing elements
+   * Creates a [[DataFrame]] with a single [[LongType]] column named `id`, 
containing elements
* in a range from `start` to `end` (exclusive) with a step value.
*
* @since 2.0.0
-   * @group dataset
+   * @group dataframe
*/
   @Experimental
-  def range(start: Long, end: Long, step: Long): Dataset[java.lang.Long] = {
-sparkSession.range(start, end, step)
+  def range(start: Long, end: Long, step: Long): DataFrame = {
+sparkSession.range(start, end, step).toDF()
   }
 
   /**
* :: Experimental ::
-   * Creates a [[Dataset]] with a single [[LongType]] column named `id`, 
containing elements
-   * in a range from `start` to `end` (exclusive) with a step value, with 
partition number
+   * Creates a [[DataFrame]] with a single [[LongType]] column named `id`, 
containing elements
+   * in an range from `start` to `end` (exclusive) with an step value, with 
partition number
* specified.
*
-   * @since 2.0.0
-   * @group dataset
+   * @since 1.4.0
+   * @group dataframe
*/
   @Experimental
-  def range(start: Long, end: Long, step: Long, numPartitions: Int): 
Dataset[java.lang.Long] = {
-sparkSession.range(start, end, step, numPartitions)
+  def range(start: Long, end: Long, step: Long, numPartitions: Int): DataFrame 
= {
+sparkSession.range(start, end, step, numPartitions).toDF()
   }
 
   /**


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15856][SQL] Revert API breaking changes made in SQLContext.range

2016-06-11 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 5bb4564cd -> 75705e8db


[SPARK-15856][SQL] Revert API breaking changes made in SQLContext.range

## What changes were proposed in this pull request?

It's easy for users to call `range(...).as[Long]` to get typed Dataset, and 
don't worth an API breaking change. This PR reverts it.

## How was this patch tested?

N/A

Author: Wenchen Fan 

Closes #13605 from cloud-fan/range.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/75705e8d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/75705e8d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/75705e8d

Branch: refs/heads/master
Commit: 75705e8dbb51ac91ffc7012fa67f072494c13832
Parents: 5bb4564
Author: Wenchen Fan 
Authored: Sat Jun 11 15:28:40 2016 -0700
Committer: Reynold Xin 
Committed: Sat Jun 11 15:28:40 2016 -0700

--
 .../scala/org/apache/spark/sql/SQLContext.scala | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/75705e8d/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
index 23f2b6e..6fcc9bb 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
@@ -609,51 +609,51 @@ class SQLContext private[sql](val sparkSession: 
SparkSession)
 
   /**
* :: Experimental ::
-   * Creates a [[Dataset]] with a single [[LongType]] column named `id`, 
containing elements
+   * Creates a [[DataFrame]] with a single [[LongType]] column named `id`, 
containing elements
* in a range from 0 to `end` (exclusive) with step value 1.
*
-   * @since 2.0.0
-   * @group dataset
+   * @since 1.4.1
+   * @group dataframe
*/
   @Experimental
-  def range(end: Long): Dataset[java.lang.Long] = sparkSession.range(end)
+  def range(end: Long): DataFrame = sparkSession.range(end).toDF()
 
   /**
* :: Experimental ::
-   * Creates a [[Dataset]] with a single [[LongType]] column named `id`, 
containing elements
+   * Creates a [[DataFrame]] with a single [[LongType]] column named `id`, 
containing elements
* in a range from `start` to `end` (exclusive) with step value 1.
*
-   * @since 2.0.0
-   * @group dataset
+   * @since 1.4.0
+   * @group dataframe
*/
   @Experimental
-  def range(start: Long, end: Long): Dataset[java.lang.Long] = 
sparkSession.range(start, end)
+  def range(start: Long, end: Long): DataFrame = sparkSession.range(start, 
end).toDF()
 
   /**
* :: Experimental ::
-   * Creates a [[Dataset]] with a single [[LongType]] column named `id`, 
containing elements
+   * Creates a [[DataFrame]] with a single [[LongType]] column named `id`, 
containing elements
* in a range from `start` to `end` (exclusive) with a step value.
*
* @since 2.0.0
-   * @group dataset
+   * @group dataframe
*/
   @Experimental
-  def range(start: Long, end: Long, step: Long): Dataset[java.lang.Long] = {
-sparkSession.range(start, end, step)
+  def range(start: Long, end: Long, step: Long): DataFrame = {
+sparkSession.range(start, end, step).toDF()
   }
 
   /**
* :: Experimental ::
-   * Creates a [[Dataset]] with a single [[LongType]] column named `id`, 
containing elements
-   * in a range from `start` to `end` (exclusive) with a step value, with 
partition number
+   * Creates a [[DataFrame]] with a single [[LongType]] column named `id`, 
containing elements
+   * in an range from `start` to `end` (exclusive) with an step value, with 
partition number
* specified.
*
-   * @since 2.0.0
-   * @group dataset
+   * @since 1.4.0
+   * @group dataframe
*/
   @Experimental
-  def range(start: Long, end: Long, step: Long, numPartitions: Int): 
Dataset[java.lang.Long] = {
-sparkSession.range(start, end, step, numPartitions)
+  def range(start: Long, end: Long, step: Long, numPartitions: Int): DataFrame 
= {
+sparkSession.range(start, end, step, numPartitions).toDF()
   }
 
   /**


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15881] Update microbenchmark results for WideSchemaBenchmark

2016-06-11 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 4c7b208ab -> 304ec5de3


[SPARK-15881] Update microbenchmark results for WideSchemaBenchmark

## What changes were proposed in this pull request?

These were not updated after performance improvements. To make updating them 
easier, I also moved the results from inline comments out into a file, which is 
auto-generated when the benchmark is re-run.

Author: Eric Liang 

Closes #13607 from ericl/sc-3538.

(cherry picked from commit 5bb4564cd47c8bf06409287e0de4ec45609970b2)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/304ec5de
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/304ec5de
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/304ec5de

Branch: refs/heads/branch-2.0
Commit: 304ec5de34a998f83db5e565b80622184d68e7f7
Parents: 4c7b208
Author: Eric Liang 
Authored: Sat Jun 11 15:26:08 2016 -0700
Committer: Reynold Xin 
Committed: Sat Jun 11 15:26:13 2016 -0700

--
 project/SparkBuild.scala|   2 +-
 .../benchmarks/WideSchemaBenchmark-results.txt  |  93 +++
 sql/core/src/test/resources/log4j.properties|   2 +-
 .../benchmark/WideSchemaBenchmark.scala | 260 ++-
 4 files changed, 123 insertions(+), 234 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/304ec5de/project/SparkBuild.scala
--
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 304288a..2f7da31 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -833,7 +833,7 @@ object TestSettings {
 javaOptions in Test += "-Dspark.ui.enabled=false",
 javaOptions in Test += "-Dspark.ui.showConsoleProgress=false",
 javaOptions in Test += "-Dspark.unsafe.exceptionOnMemoryLeak=true",
-javaOptions in Test += "-Dsun.io.serialization.extendedDebugInfo=true",
+javaOptions in Test += "-Dsun.io.serialization.extendedDebugInfo=false",
 javaOptions in Test += "-Dderby.system.durability=test",
 javaOptions in Test ++= 
System.getProperties.asScala.filter(_._1.startsWith("spark"))
   .map { case (k,v) => s"-D$k=$v" }.toSeq,

http://git-wip-us.apache.org/repos/asf/spark/blob/304ec5de/sql/core/benchmarks/WideSchemaBenchmark-results.txt
--
diff --git a/sql/core/benchmarks/WideSchemaBenchmark-results.txt 
b/sql/core/benchmarks/WideSchemaBenchmark-results.txt
new file mode 100644
index 000..ea6a661
--- /dev/null
+++ b/sql/core/benchmarks/WideSchemaBenchmark-results.txt
@@ -0,0 +1,93 @@
+OpenJDK 64-Bit Server VM 1.8.0_66-internal-b17 on Linux 4.2.0-36-generic
+Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
+parsing large select:Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+1 select expressions 3 /5  0.0 
2967064.0   1.0X
+100 select expressions  11 /   12  0.0
11369518.0   0.3X
+2500 select expressions243 /  250  0.0   
242561004.0   0.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_66-internal-b17 on Linux 4.2.0-36-generic
+Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
+many column field r/w:   Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+1 cols x 10 rows (read in-mem)  28 /   40  3.6 
278.8   1.0X
+1 cols x 10 rows (exec in-mem)  28 /   42  3.5 
284.0   1.0X
+1 cols x 10 rows (read parquet) 23 /   35  4.4 
228.8   1.2X
+1 cols x 10 rows (write parquet)   163 /  182  0.6
1633.0   0.2X
+100 cols x 1000 rows (read in-mem)  27 /   39  3.7 
266.9   1.0X
+100 cols x 1000 rows (exec in-mem)  48 /   79  2.1 
481.7   0.6X
+100 cols x 1000 rows (read parquet) 25 /   36  3.9 
254.3   1.1X
+100 cols x 1000 rows (write parquet)   182 /  196  0.5
1819.5   0.2X
+2500 cols x 40 rows (read in-mem)  280 /  315  0.4
2797.1   0.1X
+2500 cols x 40 rows (exec in-mem)  606 /  638  0.2
6064.3   0.0X
+2500 cols x 40 rows (read parquet) 836 /  843  0.1
8356.4   0.0X
+2500 cols x 40 rows (write parquet)490 /  522  0.2
4900.6   0.1X
+
+OpenJDK 64-Bit Server VM 1.8.0_66-internal-b17 on 

spark git commit: [SPARK-15881] Update microbenchmark results for WideSchemaBenchmark

2016-06-11 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master cb5d933d8 -> 5bb4564cd


[SPARK-15881] Update microbenchmark results for WideSchemaBenchmark

## What changes were proposed in this pull request?

These were not updated after performance improvements. To make updating them 
easier, I also moved the results from inline comments out into a file, which is 
auto-generated when the benchmark is re-run.

Author: Eric Liang 

Closes #13607 from ericl/sc-3538.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5bb4564c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5bb4564c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5bb4564c

Branch: refs/heads/master
Commit: 5bb4564cd47c8bf06409287e0de4ec45609970b2
Parents: cb5d933
Author: Eric Liang 
Authored: Sat Jun 11 15:26:08 2016 -0700
Committer: Reynold Xin 
Committed: Sat Jun 11 15:26:08 2016 -0700

--
 project/SparkBuild.scala|   2 +-
 .../benchmarks/WideSchemaBenchmark-results.txt  |  93 +++
 sql/core/src/test/resources/log4j.properties|   2 +-
 .../benchmark/WideSchemaBenchmark.scala | 260 ++-
 4 files changed, 123 insertions(+), 234 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5bb4564c/project/SparkBuild.scala
--
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 304288a..2f7da31 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -833,7 +833,7 @@ object TestSettings {
 javaOptions in Test += "-Dspark.ui.enabled=false",
 javaOptions in Test += "-Dspark.ui.showConsoleProgress=false",
 javaOptions in Test += "-Dspark.unsafe.exceptionOnMemoryLeak=true",
-javaOptions in Test += "-Dsun.io.serialization.extendedDebugInfo=true",
+javaOptions in Test += "-Dsun.io.serialization.extendedDebugInfo=false",
 javaOptions in Test += "-Dderby.system.durability=test",
 javaOptions in Test ++= 
System.getProperties.asScala.filter(_._1.startsWith("spark"))
   .map { case (k,v) => s"-D$k=$v" }.toSeq,

http://git-wip-us.apache.org/repos/asf/spark/blob/5bb4564c/sql/core/benchmarks/WideSchemaBenchmark-results.txt
--
diff --git a/sql/core/benchmarks/WideSchemaBenchmark-results.txt 
b/sql/core/benchmarks/WideSchemaBenchmark-results.txt
new file mode 100644
index 000..ea6a661
--- /dev/null
+++ b/sql/core/benchmarks/WideSchemaBenchmark-results.txt
@@ -0,0 +1,93 @@
+OpenJDK 64-Bit Server VM 1.8.0_66-internal-b17 on Linux 4.2.0-36-generic
+Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
+parsing large select:Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+1 select expressions 3 /5  0.0 
2967064.0   1.0X
+100 select expressions  11 /   12  0.0
11369518.0   0.3X
+2500 select expressions243 /  250  0.0   
242561004.0   0.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_66-internal-b17 on Linux 4.2.0-36-generic
+Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
+many column field r/w:   Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+1 cols x 10 rows (read in-mem)  28 /   40  3.6 
278.8   1.0X
+1 cols x 10 rows (exec in-mem)  28 /   42  3.5 
284.0   1.0X
+1 cols x 10 rows (read parquet) 23 /   35  4.4 
228.8   1.2X
+1 cols x 10 rows (write parquet)   163 /  182  0.6
1633.0   0.2X
+100 cols x 1000 rows (read in-mem)  27 /   39  3.7 
266.9   1.0X
+100 cols x 1000 rows (exec in-mem)  48 /   79  2.1 
481.7   0.6X
+100 cols x 1000 rows (read parquet) 25 /   36  3.9 
254.3   1.1X
+100 cols x 1000 rows (write parquet)   182 /  196  0.5
1819.5   0.2X
+2500 cols x 40 rows (read in-mem)  280 /  315  0.4
2797.1   0.1X
+2500 cols x 40 rows (exec in-mem)  606 /  638  0.2
6064.3   0.0X
+2500 cols x 40 rows (read parquet) 836 /  843  0.1
8356.4   0.0X
+2500 cols x 40 rows (write parquet)490 /  522  0.2
4900.6   0.1X
+
+OpenJDK 64-Bit Server VM 1.8.0_66-internal-b17 on Linux 4.2.0-36-generic
+Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
+wide shallowly nested struct field r/w:

spark git commit: [SPARK-15585][SQL] Add doc for turning off quotations

2016-06-11 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master ad102af16 -> cb5d933d8


[SPARK-15585][SQL] Add doc for turning off quotations

## What changes were proposed in this pull request?
This pr is to add doc for turning off quotations because this behavior is 
different from `com.databricks.spark.csv`.

## How was this patch tested?
Check behavior  to put an empty string in csv options.

Author: Takeshi YAMAMURO 

Closes #13616 from maropu/SPARK-15585-2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cb5d933d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cb5d933d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cb5d933d

Branch: refs/heads/master
Commit: cb5d933d86ac4afd947874f1f1c31c7154cb8249
Parents: ad102af
Author: Takeshi YAMAMURO 
Authored: Sat Jun 11 15:12:21 2016 -0700
Committer: Reynold Xin 
Committed: Sat Jun 11 15:12:21 2016 -0700

--
 python/pyspark/sql/readwriter.py  |  6 --
 .../main/scala/org/apache/spark/sql/DataFrameReader.scala |  4 +++-
 .../spark/sql/execution/datasources/csv/CSVSuite.scala| 10 ++
 3 files changed, 17 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cb5d933d/python/pyspark/sql/readwriter.py
--
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index 9208a52..7d1f186 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -320,7 +320,8 @@ class DataFrameReader(object):
  it uses the default value, ``UTF-8``.
 :param quote: sets the single character used for escaping quoted 
values where the
   separator can be part of the value. If None is set, it 
uses the default
-  value, ``"``.
+  value, ``"``. If you would like to turn off quotations, 
you need to set an
+  empty string.
 :param escape: sets the single character used for escaping quotes 
inside an already
quoted value. If None is set, it uses the default 
value, ``\``.
 :param comment: sets the single character used for skipping lines 
beginning with this
@@ -804,7 +805,8 @@ class DataFrameWriter(object):
 set, it uses the default value, ``,``.
 :param quote: sets the single character used for escaping quoted 
values where the
   separator can be part of the value. If None is set, it 
uses the default
-  value, ``"``.
+  value, ``"``. If you would like to turn off quotations, 
you need to set an
+  empty string.
 :param escape: sets the single character used for escaping quotes 
inside an already
quoted value. If None is set, it uses the default 
value, ``\``
 :param escapeQuotes: A flag indicating whether values containing 
quotes should always

http://git-wip-us.apache.org/repos/asf/spark/blob/cb5d933d/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
index b248583..bb5fa2b 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
@@ -370,7 +370,9 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* `encoding` (default `UTF-8`): decodes the CSV files by the given 
encoding
* type.
* `quote` (default `"`): sets the single character used for escaping 
quoted values where
-   * the separator can be part of the value.
+   * the separator can be part of the value. If you would like to turn off 
quotations, you need to
+   * set not `null` but an empty string. This behaviour is different form
+   * `com.databricks.spark.csv`.
* `escape` (default `\`): sets the single character used for escaping 
quotes inside
* an already quoted value.
* `comment` (default empty string): sets the single character used for 
skipping lines

http://git-wip-us.apache.org/repos/asf/spark/blob/cb5d933d/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
index bc95446..f170065 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datas

spark git commit: [SPARK-15585][SQL] Add doc for turning off quotations

2016-06-11 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 8cf33fb8a -> 4c7b208ab


[SPARK-15585][SQL] Add doc for turning off quotations

## What changes were proposed in this pull request?
This pr is to add doc for turning off quotations because this behavior is 
different from `com.databricks.spark.csv`.

## How was this patch tested?
Check behavior  to put an empty string in csv options.

Author: Takeshi YAMAMURO 

Closes #13616 from maropu/SPARK-15585-2.

(cherry picked from commit cb5d933d86ac4afd947874f1f1c31c7154cb8249)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4c7b208a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4c7b208a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4c7b208a

Branch: refs/heads/branch-2.0
Commit: 4c7b208ab6a6ae17fa137627c90256d757ad433f
Parents: 8cf33fb
Author: Takeshi YAMAMURO 
Authored: Sat Jun 11 15:12:21 2016 -0700
Committer: Reynold Xin 
Committed: Sat Jun 11 15:12:27 2016 -0700

--
 python/pyspark/sql/readwriter.py  |  6 --
 .../main/scala/org/apache/spark/sql/DataFrameReader.scala |  4 +++-
 .../spark/sql/execution/datasources/csv/CSVSuite.scala| 10 ++
 3 files changed, 17 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4c7b208a/python/pyspark/sql/readwriter.py
--
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index 9208a52..7d1f186 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -320,7 +320,8 @@ class DataFrameReader(object):
  it uses the default value, ``UTF-8``.
 :param quote: sets the single character used for escaping quoted 
values where the
   separator can be part of the value. If None is set, it 
uses the default
-  value, ``"``.
+  value, ``"``. If you would like to turn off quotations, 
you need to set an
+  empty string.
 :param escape: sets the single character used for escaping quotes 
inside an already
quoted value. If None is set, it uses the default 
value, ``\``.
 :param comment: sets the single character used for skipping lines 
beginning with this
@@ -804,7 +805,8 @@ class DataFrameWriter(object):
 set, it uses the default value, ``,``.
 :param quote: sets the single character used for escaping quoted 
values where the
   separator can be part of the value. If None is set, it 
uses the default
-  value, ``"``.
+  value, ``"``. If you would like to turn off quotations, 
you need to set an
+  empty string.
 :param escape: sets the single character used for escaping quotes 
inside an already
quoted value. If None is set, it uses the default 
value, ``\``
 :param escapeQuotes: A flag indicating whether values containing 
quotes should always

http://git-wip-us.apache.org/repos/asf/spark/blob/4c7b208a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
index b248583..bb5fa2b 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
@@ -370,7 +370,9 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* `encoding` (default `UTF-8`): decodes the CSV files by the given 
encoding
* type.
* `quote` (default `"`): sets the single character used for escaping 
quoted values where
-   * the separator can be part of the value.
+   * the separator can be part of the value. If you would like to turn off 
quotations, you need to
+   * set not `null` but an empty string. This behaviour is different form
+   * `com.databricks.spark.csv`.
* `escape` (default `\`): sets the single character used for escaping 
quotes inside
* an already quoted value.
* `comment` (default empty string): sets the single character used for 
skipping lines

http://git-wip-us.apache.org/repos/asf/spark/blob/4c7b208a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSui

spark git commit: [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents

2016-06-11 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 4c29c55f2 -> 8cf33fb8a


[SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents

## What changes were proposed in this pull request?

This issue fixes all broken links on Spark 2.0 preview MLLib documents. Also, 
this contains some editorial change.

**Fix broken links**
  * mllib-data-types.md
  * mllib-decision-tree.md
  * mllib-ensembles.md
  * mllib-feature-extraction.md
  * mllib-pmml-model-export.md
  * mllib-statistics.md

**Fix malformed section header and scala coding style**
  * mllib-linear-methods.md

**Replace indirect forward links with direct one**
  * ml-classification-regression.md

## How was this patch tested?

Manual tests (with `cd docs; jekyll build`.)

Author: Dongjoon Hyun 

Closes #13608 from dongjoon-hyun/SPARK-15883.

(cherry picked from commit ad102af169c7344b30d3b84aa16452fcdc22542c)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8cf33fb8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8cf33fb8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8cf33fb8

Branch: refs/heads/branch-2.0
Commit: 8cf33fb8a945e8f76833f68fc99b1ad5dee13641
Parents: 4c29c55
Author: Dongjoon Hyun 
Authored: Sat Jun 11 12:55:38 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 11 12:55:48 2016 +0100

--
 docs/ml-classification-regression.md |  4 ++--
 docs/mllib-data-types.md | 16 ++--
 docs/mllib-decision-tree.md  |  6 +++---
 docs/mllib-ensembles.md  |  6 +++---
 docs/mllib-feature-extraction.md |  2 +-
 docs/mllib-linear-methods.md | 10 +-
 docs/mllib-pmml-model-export.md  |  2 +-
 docs/mllib-statistics.md |  8 
 8 files changed, 25 insertions(+), 29 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8cf33fb8/docs/ml-classification-regression.md
--
diff --git a/docs/ml-classification-regression.md 
b/docs/ml-classification-regression.md
index 88457d4..d7e5521 100644
--- a/docs/ml-classification-regression.md
+++ b/docs/ml-classification-regression.md
@@ -815,7 +815,7 @@ The main differences between this API and the [original 
MLlib ensembles API](mll
 ## Random Forests
 
 [Random forests](http://en.wikipedia.org/wiki/Random_forest)
-are ensembles of [decision trees](ml-decision-tree.html).
+are ensembles of [decision 
trees](ml-classification-regression.html#decision-trees).
 Random forests combine many decision trees in order to reduce the risk of 
overfitting.
 The `spark.ml` implementation supports random forests for binary and 
multiclass classification and for regression,
 using both continuous and categorical features.
@@ -896,7 +896,7 @@ All output columns are optional; to exclude an output 
column, set its correspond
 ## Gradient-Boosted Trees (GBTs)
 
 [Gradient-Boosted Trees (GBTs)](http://en.wikipedia.org/wiki/Gradient_boosting)
-are ensembles of [decision trees](ml-decision-tree.html).
+are ensembles of [decision 
trees](ml-classification-regression.html#decision-trees).
 GBTs iteratively train decision trees in order to minimize a loss function.
 The `spark.ml` implementation supports GBTs for binary classification and for 
regression,
 using both continuous and categorical features.

http://git-wip-us.apache.org/repos/asf/spark/blob/8cf33fb8/docs/mllib-data-types.md
--
diff --git a/docs/mllib-data-types.md b/docs/mllib-data-types.md
index 2ffe0f1..ef56aeb 100644
--- a/docs/mllib-data-types.md
+++ b/docs/mllib-data-types.md
@@ -33,7 +33,7 @@ implementations: 
[`DenseVector`](api/scala/index.html#org.apache.spark.mllib.lin
 using the factory methods implemented in
 [`Vectors`](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) to 
create local vectors.
 
-Refer to the [`Vector` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) and [`Vectors` 
Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors) for 
details on the API.
+Refer to the [`Vector` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) and [`Vectors` 
Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) for 
details on the API.
 
 {% highlight scala %}
 import org.apache.spark.mllib.linalg.{Vector, Vectors}
@@ -199,7 +199,7 @@ After loading, the feature indices are converted to 
zero-based.
 
[`MLUtils.loadLibSVMFile`](api/scala/index.html#org.apache.spark.mllib.util.MLUtils$)
 reads training
 examples stored in LIBSVM format.
 
-Refer to the [`MLUtils` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.util.MLUtils) for details on 
the API.
+Refer to the [`MLUtils` Scala 
docs](api/scala

spark git commit: [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents

2016-06-11 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 3761330dd -> ad102af16


[SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents

## What changes were proposed in this pull request?

This issue fixes all broken links on Spark 2.0 preview MLLib documents. Also, 
this contains some editorial change.

**Fix broken links**
  * mllib-data-types.md
  * mllib-decision-tree.md
  * mllib-ensembles.md
  * mllib-feature-extraction.md
  * mllib-pmml-model-export.md
  * mllib-statistics.md

**Fix malformed section header and scala coding style**
  * mllib-linear-methods.md

**Replace indirect forward links with direct one**
  * ml-classification-regression.md

## How was this patch tested?

Manual tests (with `cd docs; jekyll build`.)

Author: Dongjoon Hyun 

Closes #13608 from dongjoon-hyun/SPARK-15883.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ad102af1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ad102af1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ad102af1

Branch: refs/heads/master
Commit: ad102af169c7344b30d3b84aa16452fcdc22542c
Parents: 3761330
Author: Dongjoon Hyun 
Authored: Sat Jun 11 12:55:38 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 11 12:55:38 2016 +0100

--
 docs/ml-classification-regression.md |  4 ++--
 docs/mllib-data-types.md | 16 ++--
 docs/mllib-decision-tree.md  |  6 +++---
 docs/mllib-ensembles.md  |  6 +++---
 docs/mllib-feature-extraction.md |  2 +-
 docs/mllib-linear-methods.md | 10 +-
 docs/mllib-pmml-model-export.md  |  2 +-
 docs/mllib-statistics.md |  8 
 8 files changed, 25 insertions(+), 29 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ad102af1/docs/ml-classification-regression.md
--
diff --git a/docs/ml-classification-regression.md 
b/docs/ml-classification-regression.md
index 88457d4..d7e5521 100644
--- a/docs/ml-classification-regression.md
+++ b/docs/ml-classification-regression.md
@@ -815,7 +815,7 @@ The main differences between this API and the [original 
MLlib ensembles API](mll
 ## Random Forests
 
 [Random forests](http://en.wikipedia.org/wiki/Random_forest)
-are ensembles of [decision trees](ml-decision-tree.html).
+are ensembles of [decision 
trees](ml-classification-regression.html#decision-trees).
 Random forests combine many decision trees in order to reduce the risk of 
overfitting.
 The `spark.ml` implementation supports random forests for binary and 
multiclass classification and for regression,
 using both continuous and categorical features.
@@ -896,7 +896,7 @@ All output columns are optional; to exclude an output 
column, set its correspond
 ## Gradient-Boosted Trees (GBTs)
 
 [Gradient-Boosted Trees (GBTs)](http://en.wikipedia.org/wiki/Gradient_boosting)
-are ensembles of [decision trees](ml-decision-tree.html).
+are ensembles of [decision 
trees](ml-classification-regression.html#decision-trees).
 GBTs iteratively train decision trees in order to minimize a loss function.
 The `spark.ml` implementation supports GBTs for binary classification and for 
regression,
 using both continuous and categorical features.

http://git-wip-us.apache.org/repos/asf/spark/blob/ad102af1/docs/mllib-data-types.md
--
diff --git a/docs/mllib-data-types.md b/docs/mllib-data-types.md
index 2ffe0f1..ef56aeb 100644
--- a/docs/mllib-data-types.md
+++ b/docs/mllib-data-types.md
@@ -33,7 +33,7 @@ implementations: 
[`DenseVector`](api/scala/index.html#org.apache.spark.mllib.lin
 using the factory methods implemented in
 [`Vectors`](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) to 
create local vectors.
 
-Refer to the [`Vector` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) and [`Vectors` 
Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors) for 
details on the API.
+Refer to the [`Vector` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) and [`Vectors` 
Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) for 
details on the API.
 
 {% highlight scala %}
 import org.apache.spark.mllib.linalg.{Vector, Vectors}
@@ -199,7 +199,7 @@ After loading, the feature indices are converted to 
zero-based.
 
[`MLUtils.loadLibSVMFile`](api/scala/index.html#org.apache.spark.mllib.util.MLUtils$)
 reads training
 examples stored in LIBSVM format.
 
-Refer to the [`MLUtils` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.util.MLUtils) for details on 
the API.
+Refer to the [`MLUtils` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.util.MLUtils$) for details on 
the API.
 
 {% highlight scala %}
 imp

spark git commit: [SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache"

2016-06-11 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 f0fa0a894 -> 4c29c55f2


[SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache"

## What changes were proposed in this pull request?

Use new Spark logo including "Apache" (now, with crushed PNGs). Remove old 
unreferenced logo files.

## How was this patch tested?

Manual check of generated HTML site and Spark UI. I searched for references to 
the deleted files to make sure they were not used.

Author: Sean Owen 

Closes #13609 from srowen/SPARK-15879.

(cherry picked from commit 3761330dd0151d7369d7fba4d4c344e9863990ef)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4c29c55f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4c29c55f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4c29c55f

Branch: refs/heads/branch-2.0
Commit: 4c29c55f22d57c5fbadd0b759155fbab4b07a70a
Parents: f0fa0a8
Author: Sean Owen 
Authored: Sat Jun 11 12:46:07 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 11 12:46:21 2016 +0100

--
 .../spark/ui/static/spark-logo-77x50px-hd.png   | Bin 3536 -> 4182 bytes
 .../org/apache/spark/ui/static/spark_logo.png   | Bin 14233 -> 0 bytes
 docs/img/incubator-logo.png | Bin 11651 -> 0 bytes
 docs/img/spark-logo-100x40px.png| Bin 3635 -> 0 bytes
 docs/img/spark-logo-77x40px-hd.png  | Bin 1904 -> 0 bytes
 docs/img/spark-logo-77x50px-hd.png  | Bin 3536 -> 0 bytes
 docs/img/spark-logo-hd.png  | Bin 13512 -> 16418 bytes
 7 files changed, 0 insertions(+), 0 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png
--
diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png
index 6c5f099..ffe2550 100644
Binary files 
a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
and 
b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
differ

http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png
--
diff --git a/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png 
b/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png
deleted file mode 100644
index 4b18734..000
Binary files 
a/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png and 
/dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/docs/img/incubator-logo.png
--
diff --git a/docs/img/incubator-logo.png b/docs/img/incubator-logo.png
deleted file mode 100644
index 33ca7f6..000
Binary files a/docs/img/incubator-logo.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/docs/img/spark-logo-100x40px.png
--
diff --git a/docs/img/spark-logo-100x40px.png b/docs/img/spark-logo-100x40px.png
deleted file mode 100644
index 54c3187..000
Binary files a/docs/img/spark-logo-100x40px.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/docs/img/spark-logo-77x40px-hd.png
--
diff --git a/docs/img/spark-logo-77x40px-hd.png 
b/docs/img/spark-logo-77x40px-hd.png
deleted file mode 100644
index 270402f..000
Binary files a/docs/img/spark-logo-77x40px-hd.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/docs/img/spark-logo-77x50px-hd.png
--
diff --git a/docs/img/spark-logo-77x50px-hd.png 
b/docs/img/spark-logo-77x50px-hd.png
deleted file mode 100644
index 6c5f099..000
Binary files a/docs/img/spark-logo-77x50px-hd.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/docs/img/spark-logo-hd.png
--
diff --git a/docs/img/spark-logo-hd.png b/docs/img/spark-logo-hd.png
index 1381e30..e4508e7 100644
Binary files a/docs/img/spark-logo-hd.png and b/docs/img/spark-logo-hd.png 
differ


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache"

2016-06-11 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 7504bc73f -> 3761330dd


[SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache"

## What changes were proposed in this pull request?

Use new Spark logo including "Apache" (now, with crushed PNGs). Remove old 
unreferenced logo files.

## How was this patch tested?

Manual check of generated HTML site and Spark UI. I searched for references to 
the deleted files to make sure they were not used.

Author: Sean Owen 

Closes #13609 from srowen/SPARK-15879.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3761330d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3761330d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3761330d

Branch: refs/heads/master
Commit: 3761330dd0151d7369d7fba4d4c344e9863990ef
Parents: 7504bc7
Author: Sean Owen 
Authored: Sat Jun 11 12:46:07 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 11 12:46:07 2016 +0100

--
 .../spark/ui/static/spark-logo-77x50px-hd.png   | Bin 3536 -> 4182 bytes
 .../org/apache/spark/ui/static/spark_logo.png   | Bin 14233 -> 0 bytes
 docs/img/incubator-logo.png | Bin 11651 -> 0 bytes
 docs/img/spark-logo-100x40px.png| Bin 3635 -> 0 bytes
 docs/img/spark-logo-77x40px-hd.png  | Bin 1904 -> 0 bytes
 docs/img/spark-logo-77x50px-hd.png  | Bin 3536 -> 0 bytes
 docs/img/spark-logo-hd.png  | Bin 13512 -> 16418 bytes
 7 files changed, 0 insertions(+), 0 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png
--
diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png
index 6c5f099..ffe2550 100644
Binary files 
a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
and 
b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
differ

http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png
--
diff --git a/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png 
b/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png
deleted file mode 100644
index 4b18734..000
Binary files 
a/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png and 
/dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/docs/img/incubator-logo.png
--
diff --git a/docs/img/incubator-logo.png b/docs/img/incubator-logo.png
deleted file mode 100644
index 33ca7f6..000
Binary files a/docs/img/incubator-logo.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/docs/img/spark-logo-100x40px.png
--
diff --git a/docs/img/spark-logo-100x40px.png b/docs/img/spark-logo-100x40px.png
deleted file mode 100644
index 54c3187..000
Binary files a/docs/img/spark-logo-100x40px.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/docs/img/spark-logo-77x40px-hd.png
--
diff --git a/docs/img/spark-logo-77x40px-hd.png 
b/docs/img/spark-logo-77x40px-hd.png
deleted file mode 100644
index 270402f..000
Binary files a/docs/img/spark-logo-77x40px-hd.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/docs/img/spark-logo-77x50px-hd.png
--
diff --git a/docs/img/spark-logo-77x50px-hd.png 
b/docs/img/spark-logo-77x50px-hd.png
deleted file mode 100644
index 6c5f099..000
Binary files a/docs/img/spark-logo-77x50px-hd.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/docs/img/spark-logo-hd.png
--
diff --git a/docs/img/spark-logo-hd.png b/docs/img/spark-logo-hd.png
index 1381e30..e4508e7 100644
Binary files a/docs/img/spark-logo-hd.png and b/docs/img/spark-logo-hd.png 
differ


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org