from:"rxin"

spark git commit: [SPARK-18458][CORE] Fix signed integer overflow problem at an expression in RadixSort.java

2016-11-19 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 856e00420 -> d93b65524


[SPARK-18458][CORE] Fix signed integer overflow problem at an expression in 
RadixSort.java

## What changes were proposed in this pull request?

This PR avoids that a result of an expression is negative due to signed integer 
overflow (e.g. 0x10?? * 8 < 0). This PR casts each operand to `long` before 
executing a calculation. Since the result is interpreted as long, the result of 
the expression is positive.

## How was this patch tested?

Manually executed query82 of TPC-DS with 100TB

Author: Kazuaki Ishizaki 

Closes #15907 from kiszk/SPARK-18458.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d93b6552
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d93b6552
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d93b6552

Branch: refs/heads/master
Commit: d93b6552473468df297a08c0bef9ea0bf0f5c13a
Parents: 856e004
Author: Kazuaki Ishizaki 
Authored: Sat Nov 19 21:50:20 2016 -0800
Committer: Reynold Xin 
Committed: Sat Nov 19 21:50:20 2016 -0800

--
 .../util/collection/unsafe/sort/RadixSort.java  | 48 ++--
 .../unsafe/sort/UnsafeInMemorySorter.java   |  2 +-
 .../collection/unsafe/sort/RadixSortSuite.scala | 28 ++--
 3 files changed, 40 insertions(+), 38 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d93b6552/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
--
diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
index 4043617..3dd3184 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java
@@ -17,6 +17,8 @@
 
 package org.apache.spark.util.collection.unsafe.sort;
 
+import com.google.common.primitives.Ints;
+
 import org.apache.spark.unsafe.Platform;
 import org.apache.spark.unsafe.array.LongArray;
 
@@ -40,14 +42,14 @@ public class RadixSort {
* of always copying the data back to position zero for efficiency.
*/
   public static int sort(
-  LongArray array, int numRecords, int startByteIndex, int endByteIndex,
+  LongArray array, long numRecords, int startByteIndex, int endByteIndex,
   boolean desc, boolean signed) {
 assert startByteIndex >= 0 : "startByteIndex (" + startByteIndex + ") 
should >= 0";
 assert endByteIndex <= 7 : "endByteIndex (" + endByteIndex + ") should <= 
7";
 assert endByteIndex > startByteIndex;
 assert numRecords * 2 <= array.size();
-int inIndex = 0;
-int outIndex = numRecords;
+long inIndex = 0;
+long outIndex = numRecords;
 if (numRecords > 0) {
   long[][] counts = getCounts(array, numRecords, startByteIndex, 
endByteIndex);
   for (int i = startByteIndex; i <= endByteIndex; i++) {
@@ -55,13 +57,13 @@ public class RadixSort {
   sortAtByte(
 array, numRecords, counts[i], i, inIndex, outIndex,
 desc, signed && i == endByteIndex);
-  int tmp = inIndex;
+  long tmp = inIndex;
   inIndex = outIndex;
   outIndex = tmp;
 }
   }
 }
-return inIndex;
+return Ints.checkedCast(inIndex);
   }
 
   /**
@@ -78,14 +80,14 @@ public class RadixSort {
* @param signed whether this is a signed (two's complement) sort (only 
applies to last byte).
*/
   private static void sortAtByte(
-  LongArray array, int numRecords, long[] counts, int byteIdx, int 
inIndex, int outIndex,
+  LongArray array, long numRecords, long[] counts, int byteIdx, long 
inIndex, long outIndex,
   boolean desc, boolean signed) {
 assert counts.length == 256;
 long[] offsets = transformCountsToOffsets(
-  counts, numRecords, array.getBaseOffset() + outIndex * 8, 8, desc, 
signed);
+  counts, numRecords, array.getBaseOffset() + outIndex * 8L, 8, desc, 
signed);
 Object baseObject = array.getBaseObject();
-long baseOffset = array.getBaseOffset() + inIndex * 8;
-long maxOffset = baseOffset + numRecords * 8;
+long baseOffset = array.getBaseOffset() + inIndex * 8L;
+long maxOffset = baseOffset + numRecords * 8L;
 for (long offset = baseOffset; offset < maxOffset; offset += 8) {
   long value = Platform.getLong(baseObject, offset);
   int bucket = (int)((value >>> (byteIdx * 8)) & 0xff);
@@ -106,13 +108,13 @@ public class RadixSort {
* significant byte. If the byte does not need sorting the array 
will be null.
*/
   private static long[][]

spark git commit: [SPARK-18505][SQL] Simplify AnalyzeColumnCommand

2016-11-18 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master e5f5c29e0 -> 6f7ff7509


[SPARK-18505][SQL] Simplify AnalyzeColumnCommand

## What changes were proposed in this pull request?
I'm spending more time at the design & code level for cost-based optimizer now, 
and have found a number of issues related to maintainability and compatibility 
that I will like to address.

This is a small pull request to clean up AnalyzeColumnCommand:

1. Removed warning on duplicated columns. Warnings in log messages are useless 
since most users that run SQL don't see them.
2. Removed the nested updateStats function, by just inlining the function.
3. Renamed a few functions to better reflect what they do.
4. Removed the factory apply method for ColumnStatStruct. It is a bad pattern 
to use a apply method that returns an instantiation of a class that is not of 
the same type (ColumnStatStruct.apply used to return CreateNamedStruct).
5. Renamed ColumnStatStruct to just AnalyzeColumnCommand.
6. Added more documentation explaining some of the non-obvious return types and 
code blocks.

In follow-up pull requests, I'd like to address the following:

1. Get rid of the Map[String, ColumnStat] map, since internally we should be 
using Attribute to reference columns, rather than strings.
2. Decouple the fields exposed by ColumnStat and internals of Spark SQL's 
execution path. Currently the two are coupled because ColumnStat takes in an 
InternalRow.
3. Correctness: Remove code path that stores statistics in the catalog using 
the base64 encoding of the UnsafeRow format, which is not stable across Spark 
versions.
4. Clearly document the data representation stored in the catalog for 
statistics.

## How was this patch tested?
Affected test cases have been updated.

Author: Reynold Xin <r...@databricks.com>

Closes #15933 from rxin/SPARK-18505.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6f7ff750
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6f7ff750
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6f7ff750

Branch: refs/heads/master
Commit: 6f7ff75091154fed7649ea6d79e887aad9fbde6a
Parents: e5f5c29
Author: Reynold Xin <r...@databricks.com>
Authored: Fri Nov 18 16:34:11 2016 -0800
Committer: Reynold Xin <r...@databricks.com>
Committed: Fri Nov 18 16:34:11 2016 -0800

--
 .../command/AnalyzeColumnCommand.scala  | 115 +++
 .../spark/sql/StatisticsColumnSuite.scala   |   2 +-
 .../org/apache/spark/sql/StatisticsTest.scala   |   7 +-
 .../spark/sql/hive/HiveExternalCatalog.scala|   4 +-
 .../spark/sql/hive/client/HiveClientImpl.scala  |   2 +-
 5 files changed, 74 insertions(+), 56 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6f7ff750/sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
index 6141fab..7fc57d0 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
@@ -17,8 +17,7 @@
 
 package org.apache.spark.sql.execution.command
 
-import scala.collection.mutable
-
+import org.apache.spark.internal.Logging
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
@@ -44,13 +43,16 @@ case class AnalyzeColumnCommand(
 val tableIdentWithDB = TableIdentifier(tableIdent.table, Some(db))
 val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdentWithDB))
 
-relation match {
+// Compute total size
+val (catalogTable: CatalogTable, sizeInBytes: Long) = relation match {
   case catalogRel: CatalogRelation =>
-updateStats(catalogRel.catalogTable,
+// This is a Hive serde format table
+(catalogRel.catalogTable,
   AnalyzeTableCommand.calculateTotalSize(sessionState, 
catalogRel.catalogTable))
 
   case logicalRel: LogicalRelation if logicalRel.catalogTable.isDefined =>
-updateStats(logicalRel.catalogTable.get,
+// This is a data source format table
+(logicalRel.catalogTable.get,
   AnalyzeTableCommand.calculateTotalSize(sessionState, 
logicalRel.catalogTable.get))
 
   case otherRelation =>
@@ -58,45 +60,45 @@ case class AnalyzeColumnCommand(
   s"${otherRelation.nodeName}.")
 }
 
-def updateStats(catalogTable: CatalogTable, newTotalSize: Long)

spark git commit: [SPARK-18457][SQL] ORC and other columnar formats using HiveShim read all columns when doing a simple count

2016-11-18 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 5912c19e7 -> ec622eb7e


[SPARK-18457][SQL] ORC and other columnar formats using HiveShim read all 
columns when doing a simple count

## What changes were proposed in this pull request?

When reading zero columns (e.g., count(*)) from ORC or any other format that 
uses HiveShim, actually set the read column list to empty for Hive to use.

## How was this patch tested?

Query correctness is handled by existing unit tests. I'm happy to add more if 
anyone can point out some case that is not covered.

Reduction in data read can be verified in the UI when built with a recent 
version of Hadoop say:
```
build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -Phive -DskipTests clean 
package
```
However the default Hadoop 2.2 that is used for unit tests does not report 
actual bytes read and instead just full file sizes (see FileScanRDD.scala line 
80). Therefore I don't think there is a good way to add a unit test for this.

I tested with the following setup using above build options
```
case class OrcData(intField: Long, stringField: String)
spark.range(1,100).map(i => OrcData(i, 
s"part-$i")).toDF().write.format("orc").save("orc_test")

sql(
  s"""CREATE EXTERNAL TABLE orc_test(
 |  intField LONG,
 |  stringField STRING
 |)
 |STORED AS ORC
 |LOCATION '${System.getProperty("user.dir") + "/orc_test"}'
   """.stripMargin)
```

## Results

query | Spark 2.0.2 | this PR
---|---|---
`sql("select count(*) from orc_test").collect`|4.4 MB|199.4 KB
`sql("select intField from orc_test").collect`|743.4 KB|743.4 KB
`sql("select * from orc_test").collect`|4.4 MB|4.4 MB

Author: Andrew Ray 

Closes #15898 from aray/sql-orc-no-col.

(cherry picked from commit 795e9fc9213cb9941ae131aadcafddb94bde5f74)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ec622eb7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ec622eb7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ec622eb7

Branch: refs/heads/branch-2.1
Commit: ec622eb7e1ffd0775c9ca4683d1032ca8d41654a
Parents: 5912c19
Author: Andrew Ray 
Authored: Fri Nov 18 11:19:49 2016 -0800
Committer: Reynold Xin 
Committed: Fri Nov 18 11:19:59 2016 -0800

--
 .../org/apache/spark/sql/hive/HiveShim.scala|  6 ++---
 .../spark/sql/hive/orc/OrcQuerySuite.scala  | 25 +++-
 2 files changed, 27 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ec622eb7/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
--
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
index 0d2a765..9e98948 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
@@ -69,13 +69,13 @@ private[hive] object HiveShim {
   }
 
   /*
-   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is 
null or empty
+   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is 
null
*/
   def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: 
Seq[String]) {
-if (ids != null && ids.nonEmpty) {
+if (ids != null) {
   ColumnProjectionUtils.appendReadColumns(conf, ids.asJava)
 }
-if (names != null && names.nonEmpty) {
+if (names != null) {
   appendReadColumnNames(conf, names)
 }
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/ec622eb7/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala
--
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala
index ecb5972..a628977 100644
--- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala
+++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala
@@ -20,11 +20,13 @@ package org.apache.spark.sql.hive.orc
 import java.nio.charset.StandardCharsets
 import java.sql.Timestamp
 
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.hive.ql.io.orc.{OrcStruct, SparkOrcNewRecordReader}
 import org.scalatest.BeforeAndAfterAll
 
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.TableIdentifier
-import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.execution.datasources.{LogicalRelation, 
RecordReaderIterator}
 import

spark git commit: [SPARK-18457][SQL] ORC and other columnar formats using HiveShim read all columns when doing a simple count

2016-11-18 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 51baca221 -> 795e9fc92


[SPARK-18457][SQL] ORC and other columnar formats using HiveShim read all 
columns when doing a simple count

## What changes were proposed in this pull request?

When reading zero columns (e.g., count(*)) from ORC or any other format that 
uses HiveShim, actually set the read column list to empty for Hive to use.

## How was this patch tested?

Query correctness is handled by existing unit tests. I'm happy to add more if 
anyone can point out some case that is not covered.

Reduction in data read can be verified in the UI when built with a recent 
version of Hadoop say:
```
build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -Phive -DskipTests clean 
package
```
However the default Hadoop 2.2 that is used for unit tests does not report 
actual bytes read and instead just full file sizes (see FileScanRDD.scala line 
80). Therefore I don't think there is a good way to add a unit test for this.

I tested with the following setup using above build options
```
case class OrcData(intField: Long, stringField: String)
spark.range(1,100).map(i => OrcData(i, 
s"part-$i")).toDF().write.format("orc").save("orc_test")

sql(
  s"""CREATE EXTERNAL TABLE orc_test(
 |  intField LONG,
 |  stringField STRING
 |)
 |STORED AS ORC
 |LOCATION '${System.getProperty("user.dir") + "/orc_test"}'
   """.stripMargin)
```

## Results

query | Spark 2.0.2 | this PR
---|---|---
`sql("select count(*) from orc_test").collect`|4.4 MB|199.4 KB
`sql("select intField from orc_test").collect`|743.4 KB|743.4 KB
`sql("select * from orc_test").collect`|4.4 MB|4.4 MB

Author: Andrew Ray 

Closes #15898 from aray/sql-orc-no-col.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/795e9fc9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/795e9fc9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/795e9fc9

Branch: refs/heads/master
Commit: 795e9fc9213cb9941ae131aadcafddb94bde5f74
Parents: 51baca2
Author: Andrew Ray 
Authored: Fri Nov 18 11:19:49 2016 -0800
Committer: Reynold Xin 
Committed: Fri Nov 18 11:19:49 2016 -0800

--
 .../org/apache/spark/sql/hive/HiveShim.scala|  6 ++---
 .../spark/sql/hive/orc/OrcQuerySuite.scala  | 25 +++-
 2 files changed, 27 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/795e9fc9/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
--
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
index 0d2a765..9e98948 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
@@ -69,13 +69,13 @@ private[hive] object HiveShim {
   }
 
   /*
-   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is 
null or empty
+   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is 
null
*/
   def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: 
Seq[String]) {
-if (ids != null && ids.nonEmpty) {
+if (ids != null) {
   ColumnProjectionUtils.appendReadColumns(conf, ids.asJava)
 }
-if (names != null && names.nonEmpty) {
+if (names != null) {
   appendReadColumnNames(conf, names)
 }
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/795e9fc9/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala
--
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala
index ecb5972..a628977 100644
--- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala
+++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala
@@ -20,11 +20,13 @@ package org.apache.spark.sql.hive.orc
 import java.nio.charset.StandardCharsets
 import java.sql.Timestamp
 
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.hive.ql.io.orc.{OrcStruct, SparkOrcNewRecordReader}
 import org.scalatest.BeforeAndAfterAll
 
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.TableIdentifier
-import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.execution.datasources.{LogicalRelation, 
RecordReaderIterator}
 import org.apache.spark.sql.hive.{HiveUtils, MetastoreRelation}
 import org.apache.spark.sql.hive.test.TestHive._
 import

spark git commit: [SPARK-18462] Fix ClassCastException in SparkListenerDriverAccumUpdates event

2016-11-17 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 fc466be4f -> e8b1955e2


[SPARK-18462] Fix ClassCastException in SparkListenerDriverAccumUpdates event

## What changes were proposed in this pull request?

This patch fixes a `ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Long` error which could occur in the HistoryServer while trying to 
process a deserialized `SparkListenerDriverAccumUpdates` event.

The problem stems from how `jackson-module-scala` handles primitive type 
parameters (see 
https://github.com/FasterXML/jackson-module-scala/wiki/FAQ#deserializing-optionint-and-other-primitive-challenges
 for more details). This was causing a problem where our code expected a field 
to be deserialized as a `(Long, Long)` tuple but we got an `(Int, Int)` tuple 
instead.

This patch hacks around this issue by registering a custom `Converter` with 
Jackson in order to deserialize the tuples as `(Object, Object)` and perform 
the appropriate casting.

## How was this patch tested?

New regression tests in `SQLListenerSuite`.

Author: Josh Rosen 

Closes #15922 from JoshRosen/SPARK-18462.

(cherry picked from commit d9dd979d170f44383a9a87f892f2486ddb3cca7d)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e8b1955e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e8b1955e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e8b1955e

Branch: refs/heads/branch-2.1
Commit: e8b1955e20a966da9a95f75320680cbab1096540
Parents: fc466be
Author: Josh Rosen 
Authored: Thu Nov 17 18:45:15 2016 -0800
Committer: Reynold Xin 
Committed: Thu Nov 17 18:45:22 2016 -0800

--
 .../spark/sql/execution/ui/SQLListener.scala| 39 -
 .../sql/execution/ui/SQLListenerSuite.scala | 44 +++-
 2 files changed, 80 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e8b1955e/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala
index 60f1343..5daf215 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala
@@ -19,6 +19,11 @@ package org.apache.spark.sql.execution.ui
 
 import scala.collection.mutable
 
+import com.fasterxml.jackson.databind.JavaType
+import com.fasterxml.jackson.databind.`type`.TypeFactory
+import com.fasterxml.jackson.databind.annotation.JsonDeserialize
+import com.fasterxml.jackson.databind.util.Converter
+
 import org.apache.spark.{JobExecutionStatus, SparkConf}
 import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.internal.Logging
@@ -43,9 +48,41 @@ case class SparkListenerSQLExecutionEnd(executionId: Long, 
time: Long)
   extends SparkListenerEvent
 
 @DeveloperApi
-case class SparkListenerDriverAccumUpdates(executionId: Long, accumUpdates: 
Seq[(Long, Long)])
+case class SparkListenerDriverAccumUpdates(
+executionId: Long,
+@JsonDeserialize(contentConverter = classOf[LongLongTupleConverter])
+accumUpdates: Seq[(Long, Long)])
   extends SparkListenerEvent
 
+/**
+ * Jackson [[Converter]] for converting an (Int, Int) tuple into a (Long, 
Long) tuple.
+ *
+ * This is necessary due to limitations in how Jackson's scala module 
deserializes primitives;
+ * see the "Deserializing Option[Int] and other primitive challenges" section 
in
+ * https://github.com/FasterXML/jackson-module-scala/wiki/FAQ for a discussion 
of this issue and
+ * SPARK-18462 for the specific problem that motivated this conversion.
+ */
+private class LongLongTupleConverter extends Converter[(Object, Object), 
(Long, Long)] {
+
+  override def convert(in: (Object, Object)): (Long, Long) = {
+def toLong(a: Object): Long = a match {
+  case i: java.lang.Integer => i.intValue()
+  case l: java.lang.Long => l.longValue()
+}
+(toLong(in._1), toLong(in._2))
+  }
+
+  override def getInputType(typeFactory: TypeFactory): JavaType = {
+val objectType = typeFactory.uncheckedSimpleType(classOf[Object])
+typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], 
Array(objectType, objectType))
+  }
+
+  override def getOutputType(typeFactory: TypeFactory): JavaType = {
+val longType = typeFactory.uncheckedSimpleType(classOf[Long])
+typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], 
Array(longType, longType))
+  }
+}
+
 class SQLHistoryListenerFactory extends

spark git commit: [SPARK-18462] Fix ClassCastException in SparkListenerDriverAccumUpdates event

2016-11-17 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master ce13c2672 -> d9dd979d1


[SPARK-18462] Fix ClassCastException in SparkListenerDriverAccumUpdates event

## What changes were proposed in this pull request?

This patch fixes a `ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Long` error which could occur in the HistoryServer while trying to 
process a deserialized `SparkListenerDriverAccumUpdates` event.

The problem stems from how `jackson-module-scala` handles primitive type 
parameters (see 
https://github.com/FasterXML/jackson-module-scala/wiki/FAQ#deserializing-optionint-and-other-primitive-challenges
 for more details). This was causing a problem where our code expected a field 
to be deserialized as a `(Long, Long)` tuple but we got an `(Int, Int)` tuple 
instead.

This patch hacks around this issue by registering a custom `Converter` with 
Jackson in order to deserialize the tuples as `(Object, Object)` and perform 
the appropriate casting.

## How was this patch tested?

New regression tests in `SQLListenerSuite`.

Author: Josh Rosen 

Closes #15922 from JoshRosen/SPARK-18462.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d9dd979d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d9dd979d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d9dd979d

Branch: refs/heads/master
Commit: d9dd979d170f44383a9a87f892f2486ddb3cca7d
Parents: ce13c26
Author: Josh Rosen 
Authored: Thu Nov 17 18:45:15 2016 -0800
Committer: Reynold Xin 
Committed: Thu Nov 17 18:45:15 2016 -0800

--
 .../spark/sql/execution/ui/SQLListener.scala| 39 -
 .../sql/execution/ui/SQLListenerSuite.scala | 44 +++-
 2 files changed, 80 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d9dd979d/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala
index 60f1343..5daf215 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala
@@ -19,6 +19,11 @@ package org.apache.spark.sql.execution.ui
 
 import scala.collection.mutable
 
+import com.fasterxml.jackson.databind.JavaType
+import com.fasterxml.jackson.databind.`type`.TypeFactory
+import com.fasterxml.jackson.databind.annotation.JsonDeserialize
+import com.fasterxml.jackson.databind.util.Converter
+
 import org.apache.spark.{JobExecutionStatus, SparkConf}
 import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.internal.Logging
@@ -43,9 +48,41 @@ case class SparkListenerSQLExecutionEnd(executionId: Long, 
time: Long)
   extends SparkListenerEvent
 
 @DeveloperApi
-case class SparkListenerDriverAccumUpdates(executionId: Long, accumUpdates: 
Seq[(Long, Long)])
+case class SparkListenerDriverAccumUpdates(
+executionId: Long,
+@JsonDeserialize(contentConverter = classOf[LongLongTupleConverter])
+accumUpdates: Seq[(Long, Long)])
   extends SparkListenerEvent
 
+/**
+ * Jackson [[Converter]] for converting an (Int, Int) tuple into a (Long, 
Long) tuple.
+ *
+ * This is necessary due to limitations in how Jackson's scala module 
deserializes primitives;
+ * see the "Deserializing Option[Int] and other primitive challenges" section 
in
+ * https://github.com/FasterXML/jackson-module-scala/wiki/FAQ for a discussion 
of this issue and
+ * SPARK-18462 for the specific problem that motivated this conversion.
+ */
+private class LongLongTupleConverter extends Converter[(Object, Object), 
(Long, Long)] {
+
+  override def convert(in: (Object, Object)): (Long, Long) = {
+def toLong(a: Object): Long = a match {
+  case i: java.lang.Integer => i.intValue()
+  case l: java.lang.Long => l.longValue()
+}
+(toLong(in._1), toLong(in._2))
+  }
+
+  override def getInputType(typeFactory: TypeFactory): JavaType = {
+val objectType = typeFactory.uncheckedSimpleType(classOf[Object])
+typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], 
Array(objectType, objectType))
+  }
+
+  override def getOutputType(typeFactory: TypeFactory): JavaType = {
+val longType = typeFactory.uncheckedSimpleType(classOf[Long])
+typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], 
Array(longType, longType))
+  }
+}
+
 class SQLHistoryListenerFactory extends SparkHistoryListenerFactory {
 
   override def createListeners(conf: SparkConf, sparkUI: SparkUI): 
Seq[SparkListener] = {

spark git commit: [SPARK-18464][SQL] support old table which doesn't store schema in metastore

2016-11-17 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 6a3cbbc03 -> 014fceee0


[SPARK-18464][SQL] support old table which doesn't store schema in metastore

## What changes were proposed in this pull request?

Before Spark 2.1, users can create an external data source table without 
schema, and we will infer the table schema at runtime. In Spark 2.1, we decided 
to infer the schema when the table was created, so that we don't need to infer 
it again and again at runtime.

This is a good improvement, but we should still respect and support old tables 
which doesn't store table schema in metastore.

## How was this patch tested?

regression test.

Author: Wenchen Fan 

Closes #15900 from cloud-fan/hive-catalog.

(cherry picked from commit 07b3f045cd6f79b92bc86b3b1b51d3d5e6bd37ce)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/014fceee
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/014fceee
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/014fceee

Branch: refs/heads/branch-2.1
Commit: 014fceee04c69d7944c74b3794e821e4d1003dd0
Parents: 6a3cbbc
Author: Wenchen Fan 
Authored: Thu Nov 17 00:00:38 2016 -0800
Committer: Reynold Xin 
Committed: Thu Nov 17 00:00:47 2016 -0800

--
 .../spark/sql/execution/command/tables.scala|  8 ++-
 .../spark/sql/hive/HiveExternalCatalog.scala|  5 +
 .../spark/sql/hive/HiveMetastoreCatalog.scala   |  4 +++-
 .../sql/hive/MetastoreDataSourcesSuite.scala| 22 
 4 files changed, 37 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/014fceee/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
index 119e732..7049e53 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
@@ -431,7 +431,13 @@ case class DescribeTableCommand(
   describeSchema(catalog.lookupRelation(table).schema, result)
 } else {
   val metadata = catalog.getTableMetadata(table)
-  describeSchema(metadata.schema, result)
+  if (metadata.schema.isEmpty) {
+// In older version(prior to 2.1) of Spark, the table schema can be 
empty and should be
+// inferred at runtime. We should still support it.
+describeSchema(catalog.lookupRelation(metadata.identifier).schema, 
result)
+  } else {
+describeSchema(metadata.schema, result)
+  }
 
   describePartitionInfo(metadata, result)
 

http://git-wip-us.apache.org/repos/asf/spark/blob/014fceee/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
index cbd00da..8433058 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
@@ -1023,6 +1023,11 @@ object HiveExternalCatalog {
   // After SPARK-6024, we removed this flag.
   // Although we are not using `spark.sql.sources.schema` any more, we 
need to still support.
   DataType.fromJson(schema.get).asInstanceOf[StructType]
+} else if 
(props.filterKeys(_.startsWith(DATASOURCE_SCHEMA_PREFIX)).isEmpty) {
+  // If there is no schema information in table properties, it means the 
schema of this table
+  // was empty when saving into metastore, which is possible in older 
version(prior to 2.1) of
+  // Spark. We should respect it.
+  new StructType()
 } else {
   val numSchemaParts = props.get(DATASOURCE_SCHEMA_NUMPARTS)
   if (numSchemaParts.isDefined) {

http://git-wip-us.apache.org/repos/asf/spark/blob/014fceee/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
index 8e5fc88..edbde5d 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
@@ -64,7 +64,9 @@ private[hive] class

spark git commit: [SPARK-18464][SQL] support old table which doesn't store schema in metastore

2016-11-17 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 170eeb345 -> 07b3f045c


[SPARK-18464][SQL] support old table which doesn't store schema in metastore

## What changes were proposed in this pull request?

Before Spark 2.1, users can create an external data source table without 
schema, and we will infer the table schema at runtime. In Spark 2.1, we decided 
to infer the schema when the table was created, so that we don't need to infer 
it again and again at runtime.

This is a good improvement, but we should still respect and support old tables 
which doesn't store table schema in metastore.

## How was this patch tested?

regression test.

Author: Wenchen Fan 

Closes #15900 from cloud-fan/hive-catalog.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/07b3f045
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/07b3f045
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/07b3f045

Branch: refs/heads/master
Commit: 07b3f045cd6f79b92bc86b3b1b51d3d5e6bd37ce
Parents: 170eeb3
Author: Wenchen Fan 
Authored: Thu Nov 17 00:00:38 2016 -0800
Committer: Reynold Xin 
Committed: Thu Nov 17 00:00:38 2016 -0800

--
 .../spark/sql/execution/command/tables.scala|  8 ++-
 .../spark/sql/hive/HiveExternalCatalog.scala|  5 +
 .../spark/sql/hive/HiveMetastoreCatalog.scala   |  4 +++-
 .../sql/hive/MetastoreDataSourcesSuite.scala| 22 
 4 files changed, 37 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/07b3f045/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
index 119e732..7049e53 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
@@ -431,7 +431,13 @@ case class DescribeTableCommand(
   describeSchema(catalog.lookupRelation(table).schema, result)
 } else {
   val metadata = catalog.getTableMetadata(table)
-  describeSchema(metadata.schema, result)
+  if (metadata.schema.isEmpty) {
+// In older version(prior to 2.1) of Spark, the table schema can be 
empty and should be
+// inferred at runtime. We should still support it.
+describeSchema(catalog.lookupRelation(metadata.identifier).schema, 
result)
+  } else {
+describeSchema(metadata.schema, result)
+  }
 
   describePartitionInfo(metadata, result)
 

http://git-wip-us.apache.org/repos/asf/spark/blob/07b3f045/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
index cbd00da..8433058 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
@@ -1023,6 +1023,11 @@ object HiveExternalCatalog {
   // After SPARK-6024, we removed this flag.
   // Although we are not using `spark.sql.sources.schema` any more, we 
need to still support.
   DataType.fromJson(schema.get).asInstanceOf[StructType]
+} else if 
(props.filterKeys(_.startsWith(DATASOURCE_SCHEMA_PREFIX)).isEmpty) {
+  // If there is no schema information in table properties, it means the 
schema of this table
+  // was empty when saving into metastore, which is possible in older 
version(prior to 2.1) of
+  // Spark. We should respect it.
+  new StructType()
 } else {
   val numSchemaParts = props.get(DATASOURCE_SCHEMA_NUMPARTS)
   if (numSchemaParts.isDefined) {

http://git-wip-us.apache.org/repos/asf/spark/blob/07b3f045/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
index 8e5fc88..edbde5d 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
@@ -64,7 +64,9 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
 val dataSource =
   DataSource(
 sparkSession,
-

spark git commit: [YARN][DOC] Increasing NodeManager's heap size with External Shuffle Service

2016-11-16 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 3d4756d56 -> 523abfe19


[YARN][DOC] Increasing NodeManager's heap size with External Shuffle Service

## What changes were proposed in this pull request?

Suggest users to increase `NodeManager's` heap size if `External Shuffle 
Service` is enabled as
`NM` can spend a lot of time doing GC resulting in  shuffle operations being a 
bottleneck due to `Shuffle Read blocked time` bumped up.
Also because of GC  `NodeManager` can use an enormous amount of CPU and cluster 
performance will suffer.
I have seen NodeManager using 5-13G RAM and up to 2700% CPU with 
`spark_shuffle` service on.

## How was this patch tested?

 Added step 5:
![shuffle_service](https://cloud.githubusercontent.com/assets/15244468/20355499/2fec0fde-ac2a-11e6-8f8b-1c80daf71be1.png)

Author: Artur Sukhenko 

Closes #15906 from Devian-ua/nmHeapSize.

(cherry picked from commit 55589987be89ff78dadf44498352fbbd811a206e)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/523abfe1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/523abfe1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/523abfe1

Branch: refs/heads/branch-2.1
Commit: 523abfe19caa11747133877b0c8319c68ac66e56
Parents: 3d4756d
Author: Artur Sukhenko 
Authored: Wed Nov 16 15:08:01 2016 -0800
Committer: Reynold Xin 
Committed: Wed Nov 16 15:08:10 2016 -0800

--
 docs/running-on-yarn.md | 2 ++
 1 file changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/523abfe1/docs/running-on-yarn.md
--
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index cd18808..fe0221c 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -559,6 +559,8 @@ pre-packaged distribution.
 1. In the `yarn-site.xml` on each node, add `spark_shuffle` to 
`yarn.nodemanager.aux-services`,
 then set `yarn.nodemanager.aux-services.spark_shuffle.class` to
 `org.apache.spark.network.yarn.YarnShuffleService`.
+1. Increase `NodeManager's` heap size by setting `YARN_HEAPSIZE` (1000 by 
default) in `etc/hadoop/yarn-env.sh` 
+to avoid garbage collection issues during shuffle. 
 1. Restart all `NodeManager`s in your cluster.
 
 The following extra configuration options are available when the shuffle 
service is running on YARN:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [YARN][DOC] Increasing NodeManager's heap size with External Shuffle Service

2016-11-16 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 2ca8ae9aa -> 55589987b


[YARN][DOC] Increasing NodeManager's heap size with External Shuffle Service

## What changes were proposed in this pull request?

Suggest users to increase `NodeManager's` heap size if `External Shuffle 
Service` is enabled as
`NM` can spend a lot of time doing GC resulting in  shuffle operations being a 
bottleneck due to `Shuffle Read blocked time` bumped up.
Also because of GC  `NodeManager` can use an enormous amount of CPU and cluster 
performance will suffer.
I have seen NodeManager using 5-13G RAM and up to 2700% CPU with 
`spark_shuffle` service on.

## How was this patch tested?

 Added step 5:
![shuffle_service](https://cloud.githubusercontent.com/assets/15244468/20355499/2fec0fde-ac2a-11e6-8f8b-1c80daf71be1.png)

Author: Artur Sukhenko 

Closes #15906 from Devian-ua/nmHeapSize.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/55589987
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/55589987
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/55589987

Branch: refs/heads/master
Commit: 55589987be89ff78dadf44498352fbbd811a206e
Parents: 2ca8ae9
Author: Artur Sukhenko 
Authored: Wed Nov 16 15:08:01 2016 -0800
Committer: Reynold Xin 
Committed: Wed Nov 16 15:08:01 2016 -0800

--
 docs/running-on-yarn.md | 2 ++
 1 file changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/55589987/docs/running-on-yarn.md
--
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index cd18808..fe0221c 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -559,6 +559,8 @@ pre-packaged distribution.
 1. In the `yarn-site.xml` on each node, add `spark_shuffle` to 
`yarn.nodemanager.aux-services`,
 then set `yarn.nodemanager.aux-services.spark_shuffle.class` to
 `org.apache.spark.network.yarn.YarnShuffleService`.
+1. Increase `NodeManager's` heap size by setting `YARN_HEAPSIZE` (1000 by 
default) in `etc/hadoop/yarn-env.sh` 
+to avoid garbage collection issues during shuffle. 
 1. Restart all `NodeManager`s in your cluster.
 
 The following extra configuration options are available when the shuffle 
service is running on YARN:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[3/3] spark-website git commit: Add CloudSort news entry.

2016-11-15 Thread rxin

Add CloudSort news entry.


Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/8781cd3c
Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/8781cd3c
Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/8781cd3c

Branch: refs/heads/asf-site
Commit: 8781cd3c4b6e58c131b62ee251be50dec6939106
Parents: c693f2a
Author: Reynold Xin 
Authored: Tue Nov 15 22:32:03 2016 -0800
Committer: Reynold Xin 
Committed: Tue Nov 15 22:32:03 2016 -0800

--
 ...1-15-spark-wins-cloudsort-100tb-benchmark.md |  22 ++
 site/community.html |   6 +-
 site/documentation.html |  11 +-
 site/downloads.html |   6 +-
 site/examples.html  |   6 +-
 site/faq.html   |   6 +-
 site/graphx/index.html  |   6 +-
 site/index.html |   6 +-
 site/mailing-lists.html |   8 +-
 site/mllib/index.html   |   6 +-
 site/news/amp-camp-2013-registration-ope.html   |   6 +-
 .../news/announcing-the-first-spark-summit.html |   6 +-
 .../news/fourth-spark-screencast-published.html |   6 +-
 site/news/index.html|  26 ++-
 site/news/nsdi-paper.html   |   6 +-
 site/news/one-month-to-spark-summit-2015.html   |   6 +-
 .../proposals-open-for-spark-summit-east.html   |   6 +-
 ...registration-open-for-spark-summit-east.html |   6 +-
 .../news/run-spark-and-shark-on-amazon-emr.html |   6 +-
 site/news/spark-0-6-1-and-0-5-2-released.html   |   6 +-
 site/news/spark-0-6-2-released.html |   6 +-
 site/news/spark-0-7-0-released.html |   6 +-
 site/news/spark-0-7-2-released.html |   6 +-
 site/news/spark-0-7-3-released.html |   6 +-
 site/news/spark-0-8-0-released.html |   6 +-
 site/news/spark-0-8-1-released.html |   6 +-
 site/news/spark-0-9-0-released.html |   6 +-
 site/news/spark-0-9-1-released.html |   8 +-
 site/news/spark-0-9-2-released.html |   8 +-
 site/news/spark-1-0-0-released.html |   6 +-
 site/news/spark-1-0-1-released.html |   6 +-
 site/news/spark-1-0-2-released.html |   6 +-
 site/news/spark-1-1-0-released.html |   8 +-
 site/news/spark-1-1-1-released.html |   6 +-
 site/news/spark-1-2-0-released.html |   6 +-
 site/news/spark-1-2-1-released.html |   6 +-
 site/news/spark-1-2-2-released.html |   8 +-
 site/news/spark-1-3-0-released.html |   6 +-
 site/news/spark-1-4-0-released.html |   6 +-
 site/news/spark-1-4-1-released.html |   6 +-
 site/news/spark-1-5-0-released.html |   6 +-
 site/news/spark-1-5-1-released.html |   6 +-
 site/news/spark-1-5-2-released.html |   6 +-
 site/news/spark-1-6-0-released.html |   6 +-
 site/news/spark-1-6-1-released.html |   6 +-
 site/news/spark-1-6-2-released.html |   6 +-
 site/news/spark-1-6-3-released.html |   6 +-
 site/news/spark-2-0-0-released.html |   6 +-
 site/news/spark-2-0-1-released.html |   6 +-
 site/news/spark-2-0-2-released.html |   6 +-
 site/news/spark-2.0.0-preview.html  |   6 +-
 .../spark-accepted-into-apache-incubator.html   |   6 +-
 site/news/spark-and-shark-in-the-news.html  |   8 +-
 site/news/spark-becomes-tlp.html|   6 +-
 site/news/spark-featured-in-wired.html  |   6 +-
 .../spark-mailing-lists-moving-to-apache.html   |   6 +-
 site/news/spark-meetups.html|   6 +-
 site/news/spark-screencasts-published.html  |   6 +-
 site/news/spark-summit-2013-is-a-wrap.html  |   6 +-
 site/news/spark-summit-2014-videos-posted.html  |   6 +-
 site/news/spark-summit-2015-videos-posted.html  |   6 +-
 site/news/spark-summit-agenda-posted.html   |   6 +-
 .../spark-summit-east-2015-videos-posted.html   |   8 +-
 .../spark-summit-east-2016-cfp-closing.html |   6 +-
 site/news/spark-summit-east-agenda-posted.html  |   6 +-
 .../news/spark-summit-europe-agenda-posted.html |   6 +-
 site/news/spark-summit-europe.html  |   6 +-
 .../spark-summit-june-2016-agenda-posted.html   |   6 +-
 site/news/spark-tips-from-quantifind.html   |   6 +-
 .../spark-user-survey-and-powered-by-page.html  |   6 +-
 site/news/spark-version-0-6-0-released.html |   6 +-
 .../spark-wins-cloudsort-100tb-benchmark.html   | 218 +++
 ...-wins-daytona-gray-sort-100tb-benchmark.html |   6 +-
 .../strata-exercises-now-available-online.html  |   6 +-

[2/3] spark-website git commit: Add CloudSort news entry.

2016-11-15 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/news/spark-mailing-lists-moving-to-apache.html
--
diff --git a/site/news/spark-mailing-lists-moving-to-apache.html 
b/site/news/spark-mailing-lists-moving-to-apache.html
index 2c10518..45d067b 100644
--- a/site/news/spark-mailing-lists-moving-to-apache.html
+++ b/site/news/spark-mailing-lists-moving-to-apache.html
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 
wins CloudSort Benchmark as the most efficient engine
+  (Nov 15, 2016)
+
   Spark 2.0.2 
released
   (Nov 14, 2016)
 
@@ -159,9 +162,6 @@
   Spark 2.0.1 
released
   (Oct 03, 2016)
 
-  Spark 2.0.0 
released
-  (Jul 26, 2016)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/news/spark-meetups.html
--
diff --git a/site/news/spark-meetups.html b/site/news/spark-meetups.html
index 5dc78fa..5e2eadc 100644
--- a/site/news/spark-meetups.html
+++ b/site/news/spark-meetups.html
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 
wins CloudSort Benchmark as the most efficient engine
+  (Nov 15, 2016)
+
   Spark 2.0.2 
released
   (Nov 14, 2016)
 
@@ -159,9 +162,6 @@
   Spark 2.0.1 
released
   (Oct 03, 2016)
 
-  Spark 2.0.0 
released
-  (Jul 26, 2016)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/news/spark-screencasts-published.html
--
diff --git a/site/news/spark-screencasts-published.html 
b/site/news/spark-screencasts-published.html
index 829ce81..5b57d16 100644
--- a/site/news/spark-screencasts-published.html
+++ b/site/news/spark-screencasts-published.html
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 
wins CloudSort Benchmark as the most efficient engine
+  (Nov 15, 2016)
+
   Spark 2.0.2 
released
   (Nov 14, 2016)
 
@@ -159,9 +162,6 @@
   Spark 2.0.1 
released
   (Oct 03, 2016)
 
-  Spark 2.0.0 
released
-  (Jul 26, 2016)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/news/spark-summit-2013-is-a-wrap.html
--
diff --git a/site/news/spark-summit-2013-is-a-wrap.html 
b/site/news/spark-summit-2013-is-a-wrap.html
index d068281..ba84c36 100644
--- a/site/news/spark-summit-2013-is-a-wrap.html
+++ b/site/news/spark-summit-2013-is-a-wrap.html
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 
wins CloudSort Benchmark as the most efficient engine
+  (Nov 15, 2016)
+
   Spark 2.0.2 
released
   (Nov 14, 2016)
 
@@ -159,9 +162,6 @@
   Spark 2.0.1 
released
   (Oct 03, 2016)
 
-  Spark 2.0.0 
released
-  (Jul 26, 2016)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/news/spark-summit-2014-videos-posted.html
--
diff --git a/site/news/spark-summit-2014-videos-posted.html 
b/site/news/spark-summit-2014-videos-posted.html
index 4b6133f..dffeacd 100644
--- a/site/news/spark-summit-2014-videos-posted.html
+++ b/site/news/spark-summit-2014-videos-posted.html
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 
wins CloudSort Benchmark as the most efficient engine
+  (Nov 15, 2016)
+
   Spark 2.0.2 
released
   (Nov 14, 2016)
 
@@ -159,9 +162,6 @@
   Spark 2.0.1 
released
   (Oct 03, 2016)
 
-  Spark 2.0.0 
released
-  (Jul 26, 2016)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/news/spark-summit-2015-videos-posted.html
--
diff --git a/site/news/spark-summit-2015-videos-posted.html 
b/site/news/spark-summit-2015-videos-posted.html
index f211d33..32aecea 100644
--- a/site/news/spark-summit-2015-videos-posted.html
+++ b/site/news/spark-summit-2015-videos-posted.html
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 
wins CloudSort Benchmark as the most efficient engine
+  (Nov 15, 2016)
+
   Spark 2.0.2 
released
   (Nov 14, 2016)
 
@@ -159,9 +162,6 @@
   Spark 2.0.1 
released
   (Oct 03, 2016)
 
-  Spark 2.0.0 
released
-  (Jul 26, 2016)
-

[1/3] spark-website git commit: Add CloudSort news entry.

2016-11-15 Thread rxin

Repository: spark-website
Updated Branches:
  refs/heads/asf-site c693f2a7d -> 8781cd3c4


http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/releases/spark-release-1-2-1.html
--
diff --git a/site/releases/spark-release-1-2-1.html 
b/site/releases/spark-release-1-2-1.html
index f2a8c60..22e3a1e 100644
--- a/site/releases/spark-release-1-2-1.html
+++ b/site/releases/spark-release-1-2-1.html
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 
wins CloudSort Benchmark as the most efficient engine
+  (Nov 15, 2016)
+
   Spark 2.0.2 
released
   (Nov 14, 2016)
 
@@ -159,9 +162,6 @@
   Spark 2.0.1 
released
   (Oct 03, 2016)
 
-  Spark 2.0.0 
released
-  (Jul 26, 2016)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/releases/spark-release-1-2-2.html
--
diff --git a/site/releases/spark-release-1-2-2.html 
b/site/releases/spark-release-1-2-2.html
index 2fc7a38..c70ceee 100644
--- a/site/releases/spark-release-1-2-2.html
+++ b/site/releases/spark-release-1-2-2.html
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 
wins CloudSort Benchmark as the most efficient engine
+  (Nov 15, 2016)
+
   Spark 2.0.2 
released
   (Nov 14, 2016)
 
@@ -159,9 +162,6 @@
   Spark 2.0.1 
released
   (Oct 03, 2016)
 
-  Spark 2.0.0 
released
-  (Jul 26, 2016)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/releases/spark-release-1-3-0.html
--
diff --git a/site/releases/spark-release-1-3-0.html 
b/site/releases/spark-release-1-3-0.html
index 5bf1840..9e47334 100644
--- a/site/releases/spark-release-1-3-0.html
+++ b/site/releases/spark-release-1-3-0.html
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 
wins CloudSort Benchmark as the most efficient engine
+  (Nov 15, 2016)
+
   Spark 2.0.2 
released
   (Nov 14, 2016)
 
@@ -159,9 +162,6 @@
   Spark 2.0.1 
released
   (Oct 03, 2016)
 
-  Spark 2.0.0 
released
-  (Jul 26, 2016)
-
   
   Archive
 
@@ -191,7 +191,7 @@
 To download Spark 1.3 visit the downloads 
page.
 
 Spark Core
-Spark 1.3 sees a handful of usability improvements in the core engine. The 
core API now supports https://issues.apache.org/jira/browse/SPARK-5430;>multi level aggregation 
trees to help speed up expensive reduce operations. https://issues.apache.org/jira/browse/SPARK-5063;>Improved error 
reporting has been added for certain gotcha operations. Sparks Jetty 
dependency is https://issues.apache.org/jira/browse/SPARK-3996;>now 
shaded to help avoid conflicts with user programs. Spark now supports https://issues.apache.org/jira/browse/SPARK-3883;>SSL encryption for 
some communication endpoints. Finaly, realtime https://issues.apache.org/jira/browse/SPARK-3428;>GC metrics and https://issues.apache.org/jira/browse/SPARK-4874;>record counts have 
been added to the UI.
+Spark 1.3 sees a handful of usability improvements in the core engine. The 
core API now supports https://issues.apache.org/jira/browse/SPARK-5430;>multi level aggregation 
trees to help speed up expensive reduce operations. https://issues.apache.org/jira/browse/SPARK-5063;>Improved error 
reporting has been added for certain gotcha operations. Sparks Jetty 
dependency is https://issues.apache.org/jira/browse/SPARK-3996;>now 
shaded to help avoid conflicts with user programs. Spark now supports https://issues.apache.org/jira/browse/SPARK-3883;>SSL encryption for 
some communication endpoints. Finaly, realtime https://issues.apache.org/jira/browse/SPARK-3428;>GC metrics and https://issues.apache.org/jira/browse/SPARK-4874;>record counts have 
been added to the UI. 
 
 DataFrame API
 Spark 1.3 adds a new DataFrames API 
that provides powerful and convenient operators when working with structured 
datasets. The DataFrame is an evolution of the base RDD API that includes named 
fields along with schema information. Itâs easy to construct a DataFrame from 
sources such as Hive tables, JSON data, a JDBC database, or any implementation 
of Sparkâs new data source API. Data frames will become a common interchange 
format between Spark components and when importing and exporting data to other 
systems. Data frames are supported in Python, Scala, and Java.
@@ -203,7 +203,7 @@
 In this release Spark MLlib introduces several new algorithms: latent 
Dirichlet allocation (LDA) for https://issues.apache.org/jira/browse/SPARK-1405;>topic modeling,

spark git commit: [SPARK-18377][SQL] warehouse path should be a static conf

2016-11-15 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 4b35d13ba -> 4ac9759f8


[SPARK-18377][SQL] warehouse path should be a static conf

## What changes were proposed in this pull request?

it's weird that every session can set its own warehouse path at runtime, we 
should forbid it and make it a static conf.

## How was this patch tested?

existing tests.

Author: Wenchen Fan 

Closes #15825 from cloud-fan/warehouse.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4ac9759f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4ac9759f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4ac9759f

Branch: refs/heads/master
Commit: 4ac9759f807d217b6f67badc6d5f6b7138eb92d2
Parents: 4b35d13
Author: Wenchen Fan 
Authored: Tue Nov 15 20:24:36 2016 -0800
Committer: Reynold Xin 
Committed: Tue Nov 15 20:24:36 2016 -0800

--
 .../sql/catalyst/catalog/SessionCatalog.scala   |   9 +-
 .../org/apache/spark/sql/internal/SQLConf.scala |  12 +-
 .../apache/spark/sql/internal/SharedState.scala |  32 +--
 .../spark/sql/execution/command/DDLSuite.scala  | 193 +++
 .../spark/sql/internal/SQLConfSuite.scala   |  16 +-
 .../org/apache/spark/sql/hive/HiveUtils.scala   |   4 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala |  85 
 7 files changed, 142 insertions(+), 209 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4ac9759f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index c8b61d8..19a8fcd 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@@ -83,14 +83,7 @@ class SessionCatalog(
   // check whether the temporary table or function exists, then, if not, 
operate on
   // the corresponding item in the current database.
   @GuardedBy("this")
-  protected var currentDb = {
-val defaultName = DEFAULT_DATABASE
-val defaultDbDefinition =
-  CatalogDatabase(defaultName, "default database", conf.warehousePath, 
Map())
-// Initialize default database if it doesn't already exist
-createDatabase(defaultDbDefinition, ignoreIfExists = true)
-formatDatabaseName(defaultName)
-  }
+  protected var currentDb = formatDatabaseName(DEFAULT_DATABASE)
 
   /**
* Format table name, taking into account case sensitivity.

http://git-wip-us.apache.org/repos/asf/spark/blob/4ac9759f/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 6372936..b2a50c6 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -56,11 +56,6 @@ object SQLConf {
 
   }
 
-  val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir")
-.doc("The default location for managed databases and tables.")
-.stringConf
-.createWithDefault(Utils.resolveURI("spark-warehouse").toString)
-
   val OPTIMIZER_MAX_ITERATIONS = 
SQLConfigBuilder("spark.sql.optimizer.maxIterations")
 .internal()
 .doc("The max number of iterations the optimizer and analyzer runs.")
@@ -806,7 +801,7 @@ private[sql] class SQLConf extends Serializable with 
CatalystConf with Logging {
 
   def variableSubstituteDepth: Int = getConf(VARIABLE_SUBSTITUTE_DEPTH)
 
-  def warehousePath: String = new Path(getConf(WAREHOUSE_PATH)).toString
+  def warehousePath: String = new 
Path(getConf(StaticSQLConf.WAREHOUSE_PATH)).toString
 
   def ignoreCorruptFiles: Boolean = getConf(IGNORE_CORRUPT_FILES)
 
@@ -951,6 +946,11 @@ object StaticSQLConf {
 }
   }
 
+  val WAREHOUSE_PATH = buildConf("spark.sql.warehouse.dir")
+.doc("The default location for managed databases and tables.")
+.stringConf
+.createWithDefault(Utils.resolveURI("spark-warehouse").toString)
+
   val CATALOG_IMPLEMENTATION = buildConf("spark.sql.catalogImplementation")
 .internal()
 .stringConf

http://git-wip-us.apache.org/repos/asf/spark/blob/4ac9759f/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala

spark git commit: [SPARK-18377][SQL] warehouse path should be a static conf

2016-11-15 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 175c47864 -> 436ae201f


[SPARK-18377][SQL] warehouse path should be a static conf

## What changes were proposed in this pull request?

it's weird that every session can set its own warehouse path at runtime, we 
should forbid it and make it a static conf.

## How was this patch tested?

existing tests.

Author: Wenchen Fan 

Closes #15825 from cloud-fan/warehouse.

(cherry picked from commit 4ac9759f807d217b6f67badc6d5f6b7138eb92d2)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/436ae201
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/436ae201
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/436ae201

Branch: refs/heads/branch-2.1
Commit: 436ae201f825c02b9720805ada8c0dca496a1ac5
Parents: 175c478
Author: Wenchen Fan 
Authored: Tue Nov 15 20:24:36 2016 -0800
Committer: Reynold Xin 
Committed: Tue Nov 15 20:24:45 2016 -0800

--
 .../sql/catalyst/catalog/SessionCatalog.scala   |   9 +-
 .../org/apache/spark/sql/internal/SQLConf.scala |  12 +-
 .../apache/spark/sql/internal/SharedState.scala |  32 +--
 .../spark/sql/execution/command/DDLSuite.scala  | 193 +++
 .../spark/sql/internal/SQLConfSuite.scala   |  16 +-
 .../org/apache/spark/sql/hive/HiveUtils.scala   |   4 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala |  85 
 7 files changed, 142 insertions(+), 209 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/436ae201/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index c8b61d8..19a8fcd 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@@ -83,14 +83,7 @@ class SessionCatalog(
   // check whether the temporary table or function exists, then, if not, 
operate on
   // the corresponding item in the current database.
   @GuardedBy("this")
-  protected var currentDb = {
-val defaultName = DEFAULT_DATABASE
-val defaultDbDefinition =
-  CatalogDatabase(defaultName, "default database", conf.warehousePath, 
Map())
-// Initialize default database if it doesn't already exist
-createDatabase(defaultDbDefinition, ignoreIfExists = true)
-formatDatabaseName(defaultName)
-  }
+  protected var currentDb = formatDatabaseName(DEFAULT_DATABASE)
 
   /**
* Format table name, taking into account case sensitivity.

http://git-wip-us.apache.org/repos/asf/spark/blob/436ae201/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 7b8ed65..7cca9db 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -56,11 +56,6 @@ object SQLConf {
 
   }
 
-  val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir")
-.doc("The default location for managed databases and tables.")
-.stringConf
-.createWithDefault(Utils.resolveURI("spark-warehouse").toString)
-
   val OPTIMIZER_MAX_ITERATIONS = 
SQLConfigBuilder("spark.sql.optimizer.maxIterations")
 .internal()
 .doc("The max number of iterations the optimizer and analyzer runs.")
@@ -773,7 +768,7 @@ private[sql] class SQLConf extends Serializable with 
CatalystConf with Logging {
 
   def variableSubstituteDepth: Int = getConf(VARIABLE_SUBSTITUTE_DEPTH)
 
-  def warehousePath: String = new Path(getConf(WAREHOUSE_PATH)).toString
+  def warehousePath: String = new 
Path(getConf(StaticSQLConf.WAREHOUSE_PATH)).toString
 
   def ignoreCorruptFiles: Boolean = getConf(IGNORE_CORRUPT_FILES)
 
@@ -918,6 +913,11 @@ object StaticSQLConf {
 }
   }
 
+  val WAREHOUSE_PATH = buildConf("spark.sql.warehouse.dir")
+.doc("The default location for managed databases and tables.")
+.stringConf
+.createWithDefault(Utils.resolveURI("spark-warehouse").toString)
+
   val CATALOG_IMPLEMENTATION = buildConf("spark.sql.catalogImplementation")
 .internal()
 .stringConf

http://git-wip-us.apache.org/repos/asf/spark/blob/436ae201/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala

spark git commit: [SPARK-18300][SQL] Do not apply foldable propagation with expand as a child [BRANCH-2.0]

2016-11-15 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 e2452c632 -> 8d55886aa


[SPARK-18300][SQL] Do not apply foldable propagation with expand as a child 
[BRANCH-2.0]

## What changes were proposed in this pull request?
The `FoldablePropagation` optimizer rule, pulls foldable values out from under 
an `Expand`. This breaks the `Expand` in two ways:

- It rewrites the output attributes of the `Expand`. We explicitly define 
output attributes for `Expand`, these are (unfortunately) considered as part of 
the expressions of the `Expand` and can be rewritten.
- Expand can actually change the column (it will typically re-use the 
attributes or the underlying plan). This means that we cannot safely propagate 
the expressions from under an `Expand`.

This PR fixes this and (hopefully) other issues by explicitly whitelisting 
allowed operators.

This is a backport of https://github.com/apache/spark/pull/15857

## How was this patch tested?
Added tests to `FoldablePropagationSuite` and to `SQLQueryTestSuite`.

Author: Herman van Hovell 

Closes #15892 from hvanhovell/SPARK-18300-branch-2.0.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8d55886a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8d55886a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8d55886a

Branch: refs/heads/branch-2.0
Commit: 8d55886aaa781f3b9f09de1a2d6b422c95dcb4d2
Parents: e2452c6
Author: Herman van Hovell 
Authored: Tue Nov 15 18:21:26 2016 -0800
Committer: Reynold Xin 
Committed: Tue Nov 15 18:21:26 2016 -0800

--
 .../sql/catalyst/optimizer/Optimizer.scala  | 78 +---
 .../optimizer/FoldablePropagationSuite.scala| 28 ++-
 .../resources/sql-tests/inputs/group-by.sql |  3 +
 .../sql-tests/results/group-by.sql.out  | 10 ++-
 4 files changed, 88 insertions(+), 31 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8d55886a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index f0992b3..0a28ef4 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -646,46 +646,72 @@ object FoldablePropagation extends Rule[LogicalPlan] {
   }
   case _ => Nil
 })
+val replaceFoldable: PartialFunction[Expression, Expression] = {
+  case a: AttributeReference if foldableMap.contains(a) => foldableMap(a)
+}
 
 if (foldableMap.isEmpty) {
   plan
 } else {
   var stop = false
   CleanupAliases(plan.transformUp {
-case u: Union =>
-  stop = true
-  u
-case c: Command =>
-  stop = true
-  c
-// For outer join, although its output attributes are derived from its 
children, they are
-// actually different attributes: the output of outer join is not 
always picked from its
-// children, but can also be null.
+// A leaf node should not stop the folding process (note that we are 
traversing up the
+// tree, starting at the leaf nodes); so we are allowing it.
+case l: LeafNode =>
+  l
+
+// We can only propagate foldables for a subset of unary nodes.
+case u: UnaryNode if !stop && canPropagateFoldables(u) =>
+  u.transformExpressions(replaceFoldable)
+
+// Allow inner joins. We do not allow outer join, although its output 
attributes are
+// derived from its children, they are actually different attributes: 
the output of outer
+// join is not always picked from its children, but can also be null.
 // TODO(cloud-fan): It seems more reasonable to use new attributes as 
the output attributes
 // of outer join.
-case j @ Join(_, _, LeftOuter | RightOuter | FullOuter, _) =>
+case j @ Join(_, _, Inner, _) =>
+  j.transformExpressions(replaceFoldable)
+
+// We can fold the projections an expand holds. However expand changes 
the output columns
+// and often reuses the underlying attributes; so we cannot assume 
that a column is still
+// foldable after the expand has been applied.
+// TODO(hvanhovell): Expand should use new attributes as the output 
attributes.
+case expand: Expand if !stop =>
+  val newExpand = expand.copy(projections = expand.projections.map { 
projection =>
+

spark git commit: [SPARK-18232][MESOS] Support CNI

2016-11-14 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 86430cc4e -> d89bfc923


[SPARK-18232][MESOS] Support CNI

## What changes were proposed in this pull request?

Adds support for CNI-isolated containers

## How was this patch tested?

I launched SparkPi both with and without `spark.mesos.network.name`, and 
verified the job completed successfully.

Author: Michael Gummelt 

Closes #15740 from mgummelt/spark-342-cni.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d89bfc92
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d89bfc92
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d89bfc92

Branch: refs/heads/master
Commit: d89bfc92302424406847ac7a9cfca714e6b742fc
Parents: 86430cc
Author: Michael Gummelt 
Authored: Mon Nov 14 23:46:54 2016 -0800
Committer: Reynold Xin 
Committed: Mon Nov 14 23:46:54 2016 -0800

--
 docs/running-on-mesos.md|  27 +++--
 .../cluster/mesos/MesosClusterScheduler.scala   |   8 +-
 .../MesosCoarseGrainedSchedulerBackend.scala|  23 ++--
 .../MesosFineGrainedSchedulerBackend.scala  |   9 +-
 .../mesos/MesosSchedulerBackendUtil.scala   | 120 +--
 .../mesos/MesosClusterSchedulerSuite.scala  |  26 
 ...esosCoarseGrainedSchedulerBackendSuite.scala |  19 ++-
 7 files changed, 131 insertions(+), 101 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d89bfc92/docs/running-on-mesos.md
--
diff --git a/docs/running-on-mesos.md b/docs/running-on-mesos.md
index 923d8db..8d5ad12 100644
--- a/docs/running-on-mesos.md
+++ b/docs/running-on-mesos.md
@@ -368,17 +368,6 @@ See the [configuration page](configuration.html) for 
information on Spark config
   
 
 
-  spark.mesos.executor.docker.portmaps
-  (none)
-  
-Set the list of incoming ports exposed by the Docker image, which was set 
using
-spark.mesos.executor.docker.image. The format of this 
property is a comma-separated list of
-mappings which take the form:
-
-host_port:container_port[:tcp|:udp]
-  
-
-
   spark.mesos.executor.home
   driver side SPARK_HOME
   
@@ -505,12 +494,26 @@ See the [configuration page](configuration.html) for 
information on Spark config
 Set the maximum number GPU resources to acquire for this job. Note that 
executors will still launch when no GPU resources are found
 since this configuration is just a upper limit and not a guaranteed amount.
   
+  
+
+  spark.mesos.network.name
+  (none)
+  
+Attach containers to the given named network.  If this job is
+launched in cluster mode, also launch the driver in the given named
+network.  See
+http://mesos.apache.org/documentation/latest/cni/;>the Mesos CNI 
docs
+for more details.
+  
 
 
   spark.mesos.fetcherCache.enable
   false
   
-If set to `true`, all URIs (example: `spark.executor.uri`, 
`spark.mesos.uris`) will be cached by the [Mesos fetcher 
cache](http://mesos.apache.org/documentation/latest/fetcher/)
+If set to `true`, all URIs (example: `spark.executor.uri`,
+`spark.mesos.uris`) will be cached by the http://mesos.apache.org/documentation/latest/fetcher/;>Mesos
+Fetcher Cache
   
 
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d89bfc92/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
--
diff --git 
a/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 
b/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
index 8db1d12..f384290 100644
--- 
a/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
+++ 
b/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
@@ -531,13 +531,7 @@ private[spark] class MesosClusterScheduler(
   .setCommand(buildDriverCommand(desc))
   .addAllResources(cpuResourcesToUse.asJava)
   .addAllResources(memResourcesToUse.asJava)
-
-desc.conf.getOption("spark.mesos.executor.docker.image").foreach { image =>
-  MesosSchedulerBackendUtil.setupContainerBuilderDockerInfo(image,
-desc.conf,
-taskInfo.getContainerBuilder)
-}
-
+taskInfo.setContainer(MesosSchedulerBackendUtil.containerInfo(desc.conf))
 taskInfo.build
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d89bfc92/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
--
diff --git

spark-website git commit: Add Big Data Analytics with Spark and Hadoop book.

2016-11-14 Thread rxin

Repository: spark-website
Updated Branches:
  refs/heads/asf-site 8f5026783 -> 4e10a1ac1


Add Big Data Analytics with Spark and Hadoop book.


Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/4e10a1ac
Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/4e10a1ac
Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/4e10a1ac

Branch: refs/heads/asf-site
Commit: 4e10a1ac10fa773f891422c7c1a3727e47feca8e
Parents: 8f50267
Author: Reynold Xin 
Authored: Mon Nov 14 23:26:06 2016 -0800
Committer: Reynold Xin 
Committed: Mon Nov 14 23:26:06 2016 -0800

--
 documentation.md| 1 +
 site/documentation.html | 1 +
 2 files changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark-website/blob/4e10a1ac/documentation.md
--
diff --git a/documentation.md b/documentation.md
index 3927264..0ff8ed2 100644
--- a/documentation.md
+++ b/documentation.md
@@ -168,6 +168,7 @@ Slides, videos and EC2-based exercises from each of these 
are available online:
   https://www.packtpub.com/big-data-and-business-intelligence/mastering-apache-spark;>Mastering
 Apache Spark, by Mike Frampton (Packt Publishing)
   http://www.apress.com/9781484209653;>Big Data Analytics with 
Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, 
by Mohammed Guller (Apress)
   https://www.packtpub.com/big-data-and-business-intelligence/large-scale-machine-learning-spark;>Large
 Scale Machine Learning with Spark, by Md. Rezaul Karim, Md. Mahedi Kaysar 
(Packt Publishing)
+  https://www.packtpub.com/big-data-and-business-intelligence/big-data-analytics;>Big
 Data Analytics with Spark and Hadoop, by Venkat Ankam (Packt 
Publishing)
 
 
 Examples

http://git-wip-us.apache.org/repos/asf/spark-website/blob/4e10a1ac/site/documentation.html
--
diff --git a/site/documentation.html b/site/documentation.html
index 9414acd..60c1b59 100644
--- a/site/documentation.html
+++ b/site/documentation.html
@@ -342,6 +342,7 @@ Slides, videos and EC2-based exercises from each of these 
are available online:
   https://www.packtpub.com/big-data-and-business-intelligence/mastering-apache-spark;>Mastering
 Apache Spark, by Mike Frampton (Packt Publishing)
   http://www.apress.com/9781484209653;>Big Data Analytics with 
Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, 
by Mohammed Guller (Apress)
   https://www.packtpub.com/big-data-and-business-intelligence/large-scale-machine-learning-spark;>Large
 Scale Machine Learning with Spark, by Md. Rezaul Karim, Md. Mahedi Kaysar 
(Packt Publishing)
+  https://www.packtpub.com/big-data-and-business-intelligence/big-data-analytics;>Big
 Data Analytics with Spark and Hadoop, by Venkat Ankam (Packt 
Publishing)
 
 
 Examples


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark-website git commit: Update Maven coordinates.

2016-11-14 Thread rxin

Repository: spark-website
Updated Branches:
  refs/heads/asf-site 8940afe14 -> 8f5026783


Update Maven coordinates.


Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/8f502678
Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/8f502678
Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/8f502678

Branch: refs/heads/asf-site
Commit: 8f50267839c04dcf325210173b41839568b544ab
Parents: 8940afe
Author: Reynold Xin 
Authored: Mon Nov 14 23:14:45 2016 -0800
Committer: Reynold Xin 
Committed: Mon Nov 14 23:14:45 2016 -0800

--
 downloads.md| 4 ++--
 site/downloads.html | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark-website/blob/8f502678/downloads.md
--
diff --git a/downloads.md b/downloads.md
index 94462bb..0031a05 100644
--- a/downloads.md
+++ b/downloads.md
@@ -51,7 +51,7 @@ Spark artifacts are [hosted in Maven 
Central](http://search.maven.org/#search%7C
 
 groupId: org.apache.spark
 artifactId: spark-core_2.11
-version: 2.0.1
+version: 2.0.2
 
 ### Spark Source Code Management
 If you are interested in working with the newest under-development code or 
contributing to Apache Spark development, you can also check out the master 
branch from Git:
@@ -59,7 +59,7 @@ If you are interested in working with the newest 
under-development code or contr
 # Master development branch
 git clone git://github.com/apache/spark.git
 
-# 2.0 maintenance branch with stability fixes on top of Spark 2.0.1
+# 2.0 maintenance branch with stability fixes on top of Spark 2.0.2
 git clone git://github.com/apache/spark.git -b branch-2.0
 
 Once you've downloaded Spark, you can find instructions for installing and 
building it on the documentation 
page.

http://git-wip-us.apache.org/repos/asf/spark-website/blob/8f502678/site/downloads.html
--
diff --git a/site/downloads.html b/site/downloads.html
index d06b5ac..e96a141 100644
--- a/site/downloads.html
+++ b/site/downloads.html
@@ -235,7 +235,7 @@ You can select and download it above.
 
 groupId: org.apache.spark
 artifactId: spark-core_2.11
-version: 2.0.1
+version: 2.0.2
 
 
 Spark Source Code Management
@@ -244,7 +244,7 @@ version: 2.0.1
 # Master development branch
 git clone git://github.com/apache/spark.git
 
-# 2.0 maintenance branch with stability fixes on top of Spark 2.0.1
+# 2.0 maintenance branch with stability fixes on top of Spark 2.0.2
 git clone git://github.com/apache/spark.git -b branch-2.0
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-18430][SQL] Fixed Exception Messages when Hitting an Invocation Exception of Function Lookup

2016-11-14 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master c31def1dd -> 86430cc4e


[SPARK-18430][SQL] Fixed Exception Messages when Hitting an Invocation 
Exception of Function Lookup

### What changes were proposed in this pull request?
When the exception is an invocation exception during function lookup, we return 
a useless/confusing error message:

For example,
```Scala
df.selectExpr("concat_ws()")
```
Below is the error message we got:
```
null; line 1 pos 0
org.apache.spark.sql.AnalysisException: null; line 1 pos 0
```

To get the meaningful error message, we need to get the cause. The fix is 
exactly the same as what we did in https://github.com/apache/spark/pull/12136. 
After the fix, the message we got is the exception issued in the constuctor of 
function implementation:
```
requirement failed: concat_ws requires at least one argument.; line 1 pos 0
org.apache.spark.sql.AnalysisException: requirement failed: concat_ws requires 
at least one argument.; line 1 pos 0
```

### How was this patch tested?
Added test cases.

Author: gatorsmile 

Closes #15878 from gatorsmile/functionNotFound.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/86430cc4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/86430cc4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/86430cc4

Branch: refs/heads/master
Commit: 86430cc4e8dbc65a091a532fc9c5ec12b7be04f4
Parents: c31def1
Author: gatorsmile 
Authored: Mon Nov 14 21:21:34 2016 -0800
Committer: Reynold Xin 
Committed: Mon Nov 14 21:21:34 2016 -0800

--
 .../catalyst/analysis/FunctionRegistry.scala|  5 -
 .../sql-tests/inputs/string-functions.sql   |  3 +++
 .../sql-tests/results/string-functions.sql.out  | 20 
 3 files changed, 27 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/86430cc4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index b028d07..007cdc1 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -446,7 +446,10 @@ object FunctionRegistry {
 // If there is an apply method that accepts Seq[Expression], use that 
one.
 Try(varargCtor.get.newInstance(expressions).asInstanceOf[Expression]) 
match {
   case Success(e) => e
-  case Failure(e) => throw new AnalysisException(e.getMessage)
+  case Failure(e) =>
+// the exception is an invocation exception. To get a meaningful 
message, we need the
+// cause.
+throw new AnalysisException(e.getCause.getMessage)
 }
   } else {
 // Otherwise, find a constructor method that matches the number of 
arguments, and use that.

http://git-wip-us.apache.org/repos/asf/spark/blob/86430cc4/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql
--
diff --git a/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql 
b/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql
new file mode 100644
index 000..f21981e
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql
@@ -0,0 +1,3 @@
+-- Argument number exception
+select concat_ws();
+select format_string();

http://git-wip-us.apache.org/repos/asf/spark/blob/86430cc4/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out
--
diff --git 
a/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out 
b/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out
new file mode 100644
index 000..6961e9b
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out
@@ -0,0 +1,20 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 2
+
+
+-- !query 0
+select concat_ws()
+-- !query 0 schema
+struct<>
+-- !query 0 output
+org.apache.spark.sql.AnalysisException
+requirement failed: concat_ws requires at least one argument.; line 1 pos 7
+
+
+-- !query 1
+select format_string()
+-- !query 1 schema
+struct<>
+-- !query 1 output
+org.apache.spark.sql.AnalysisException
+requirement failed: format_string() should take at least 1 argument; line 1 
pos 7

spark git commit: [SPARK-18430][SQL] Fixed Exception Messages when Hitting an Invocation Exception of Function Lookup

2016-11-14 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 649c15fae -> a0125fd68


[SPARK-18430][SQL] Fixed Exception Messages when Hitting an Invocation 
Exception of Function Lookup

### What changes were proposed in this pull request?
When the exception is an invocation exception during function lookup, we return 
a useless/confusing error message:

For example,
```Scala
df.selectExpr("concat_ws()")
```
Below is the error message we got:
```
null; line 1 pos 0
org.apache.spark.sql.AnalysisException: null; line 1 pos 0
```

To get the meaningful error message, we need to get the cause. The fix is 
exactly the same as what we did in https://github.com/apache/spark/pull/12136. 
After the fix, the message we got is the exception issued in the constuctor of 
function implementation:
```
requirement failed: concat_ws requires at least one argument.; line 1 pos 0
org.apache.spark.sql.AnalysisException: requirement failed: concat_ws requires 
at least one argument.; line 1 pos 0
```

### How was this patch tested?
Added test cases.

Author: gatorsmile 

Closes #15878 from gatorsmile/functionNotFound.

(cherry picked from commit 86430cc4e8dbc65a091a532fc9c5ec12b7be04f4)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a0125fd6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a0125fd6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a0125fd6

Branch: refs/heads/branch-2.1
Commit: a0125fd6847d5dbce92dc92cb5b16ee00f0ff6a8
Parents: 649c15f
Author: gatorsmile 
Authored: Mon Nov 14 21:21:34 2016 -0800
Committer: Reynold Xin 
Committed: Mon Nov 14 21:21:45 2016 -0800

--
 .../catalyst/analysis/FunctionRegistry.scala|  5 -
 .../sql-tests/inputs/string-functions.sql   |  3 +++
 .../sql-tests/results/string-functions.sql.out  | 20 
 3 files changed, 27 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a0125fd6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index b028d07..007cdc1 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -446,7 +446,10 @@ object FunctionRegistry {
 // If there is an apply method that accepts Seq[Expression], use that 
one.
 Try(varargCtor.get.newInstance(expressions).asInstanceOf[Expression]) 
match {
   case Success(e) => e
-  case Failure(e) => throw new AnalysisException(e.getMessage)
+  case Failure(e) =>
+// the exception is an invocation exception. To get a meaningful 
message, we need the
+// cause.
+throw new AnalysisException(e.getCause.getMessage)
 }
   } else {
 // Otherwise, find a constructor method that matches the number of 
arguments, and use that.

http://git-wip-us.apache.org/repos/asf/spark/blob/a0125fd6/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql
--
diff --git a/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql 
b/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql
new file mode 100644
index 000..f21981e
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql
@@ -0,0 +1,3 @@
+-- Argument number exception
+select concat_ws();
+select format_string();

http://git-wip-us.apache.org/repos/asf/spark/blob/a0125fd6/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out
--
diff --git 
a/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out 
b/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out
new file mode 100644
index 000..6961e9b
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out
@@ -0,0 +1,20 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 2
+
+
+-- !query 0
+select concat_ws()
+-- !query 0 schema
+struct<>
+-- !query 0 output
+org.apache.spark.sql.AnalysisException
+requirement failed: concat_ws requires at least one argument.; line 1 pos 7
+
+
+-- !query 1
+select format_string()
+-- !query 1 schema
+struct<>
+-- !query 1 output
+org.apache.spark.sql.AnalysisException
+requirement failed:

spark git commit: [SPARK-18428][DOC] Update docs for GraphX

2016-11-14 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 27999b366 -> 649c15fae


[SPARK-18428][DOC] Update docs for GraphX

## What changes were proposed in this pull request?
1, Add link of `VertexRDD` and `EdgeRDD`
2, Notify in `Vertex and Edge RDDs` that not all methods are listed
3, `VertexID` -> `VertexId`

## How was this patch tested?
No tests, only docs is modified

Author: Zheng RuiFeng 

Closes #15875 from zhengruifeng/update_graphop_doc.

(cherry picked from commit c31def1ddcbed340bfc071d54fb3dc7945cb525a)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/649c15fa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/649c15fa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/649c15fa

Branch: refs/heads/branch-2.1
Commit: 649c15fae423a415cb6165aa0ef6d97ab4949afb
Parents: 27999b3
Author: Zheng RuiFeng 
Authored: Mon Nov 14 21:15:39 2016 -0800
Committer: Reynold Xin 
Committed: Mon Nov 14 21:18:35 2016 -0800

--
 docs/graphx-programming-guide.md | 68 ++-
 1 file changed, 35 insertions(+), 33 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/649c15fa/docs/graphx-programming-guide.md
--
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index 58671e6..1097cf1 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -11,6 +11,7 @@ description: GraphX graph processing library guide for Spark 
SPARK_VERSION_SHORT
 
 
 [EdgeRDD]: api/scala/index.html#org.apache.spark.graphx.EdgeRDD
+[VertexRDD]: api/scala/index.html#org.apache.spark.graphx.VertexRDD
 [Edge]: api/scala/index.html#org.apache.spark.graphx.Edge
 [EdgeTriplet]: api/scala/index.html#org.apache.spark.graphx.EdgeTriplet
 [Graph]: api/scala/index.html#org.apache.spark.graphx.Graph
@@ -89,7 +90,7 @@ with user defined objects attached to each vertex and edge.  
A directed multigra
 graph with potentially multiple parallel edges sharing the same source and 
destination vertex.  The
 ability to support parallel edges simplifies modeling scenarios where there 
can be multiple
 relationships (e.g., co-worker and friend) between the same vertices.  Each 
vertex is keyed by a
-*unique* 64-bit long identifier (`VertexID`).  GraphX does not impose any 
ordering constraints on
+*unique* 64-bit long identifier (`VertexId`).  GraphX does not impose any 
ordering constraints on
 the vertex identifiers.  Similarly, edges have corresponding source and 
destination vertex
 identifiers.
 
@@ -130,12 +131,12 @@ class Graph[VD, ED] {
 }
 {% endhighlight %}
 
-The classes `VertexRDD[VD]` and `EdgeRDD[ED]` extend and are optimized 
versions of `RDD[(VertexID,
+The classes `VertexRDD[VD]` and `EdgeRDD[ED]` extend and are optimized 
versions of `RDD[(VertexId,
 VD)]` and `RDD[Edge[ED]]` respectively.  Both `VertexRDD[VD]` and 
`EdgeRDD[ED]` provide  additional
 functionality built around graph computation and leverage internal 
optimizations.  We discuss the
-`VertexRDD` and `EdgeRDD` API in greater detail in the section on [vertex and 
edge
+`VertexRDD`[VertexRDD] and `EdgeRDD`[EdgeRDD] API in greater detail in the 
section on [vertex and edge
 RDDs](#vertex_and_edge_rdds) but for now they can be thought of as simply RDDs 
of the form:
-`RDD[(VertexID, VD)]` and `RDD[Edge[ED]]`.
+`RDD[(VertexId, VD)]` and `RDD[Edge[ED]]`.
 
 ### Example Property Graph
 
@@ -197,7 +198,7 @@ graph.edges.filter(e => e.srcId > e.dstId).count
 {% endhighlight %}
 
 > Note that `graph.vertices` returns an `VertexRDD[(String, String)]` which 
 > extends
-> `RDD[(VertexID, (String, String))]` and so we use the scala `case` 
expression to deconstruct the
+> `RDD[(VertexId, (String, String))]` and so we use the scala `case` 
expression to deconstruct the
 > tuple.  On the other hand, `graph.edges` returns an `EdgeRDD` containing 
 > `Edge[String]` objects.
 > We could have also used the case class type constructor as in the following:
 > {% highlight scala %}
@@ -287,7 +288,7 @@ class Graph[VD, ED] {
   // Change the partitioning heuristic  

   def partitionBy(partitionStrategy: PartitionStrategy): Graph[VD, ED]
   // Transform vertex and edge attributes 
==
-  def mapVertices[VD2](map: (VertexID, VD) => VD2): Graph[VD2, ED]
+  def mapVertices[VD2](map: (VertexId, VD) => VD2): Graph[VD2, ED]
   def mapEdges[ED2](map: Edge[ED] => ED2): Graph[VD, ED2]
   def mapEdges[ED2](map: (PartitionID, Iterator[Edge[ED]]) => Iterator[ED2]): 
Graph[VD, ED2]
   def mapTriplets[ED2](map:

spark git commit: [SPARK-18428][DOC] Update docs for GraphX

2016-11-14 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master c07187823 -> c31def1dd


[SPARK-18428][DOC] Update docs for GraphX

## What changes were proposed in this pull request?
1, Add link of `VertexRDD` and `EdgeRDD`
2, Notify in `Vertex and Edge RDDs` that not all methods are listed
3, `VertexID` -> `VertexId`

## How was this patch tested?
No tests, only docs is modified

Author: Zheng RuiFeng 

Closes #15875 from zhengruifeng/update_graphop_doc.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c31def1d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c31def1d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c31def1d

Branch: refs/heads/master
Commit: c31def1ddcbed340bfc071d54fb3dc7945cb525a
Parents: c071878
Author: Zheng RuiFeng 
Authored: Mon Nov 14 21:15:39 2016 -0800
Committer: Reynold Xin 
Committed: Mon Nov 14 21:15:39 2016 -0800

--
 docs/graphx-programming-guide.md | 68 ++-
 1 file changed, 35 insertions(+), 33 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c31def1d/docs/graphx-programming-guide.md
--
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index 58671e6..1097cf1 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -11,6 +11,7 @@ description: GraphX graph processing library guide for Spark 
SPARK_VERSION_SHORT
 
 
 [EdgeRDD]: api/scala/index.html#org.apache.spark.graphx.EdgeRDD
+[VertexRDD]: api/scala/index.html#org.apache.spark.graphx.VertexRDD
 [Edge]: api/scala/index.html#org.apache.spark.graphx.Edge
 [EdgeTriplet]: api/scala/index.html#org.apache.spark.graphx.EdgeTriplet
 [Graph]: api/scala/index.html#org.apache.spark.graphx.Graph
@@ -89,7 +90,7 @@ with user defined objects attached to each vertex and edge.  
A directed multigra
 graph with potentially multiple parallel edges sharing the same source and 
destination vertex.  The
 ability to support parallel edges simplifies modeling scenarios where there 
can be multiple
 relationships (e.g., co-worker and friend) between the same vertices.  Each 
vertex is keyed by a
-*unique* 64-bit long identifier (`VertexID`).  GraphX does not impose any 
ordering constraints on
+*unique* 64-bit long identifier (`VertexId`).  GraphX does not impose any 
ordering constraints on
 the vertex identifiers.  Similarly, edges have corresponding source and 
destination vertex
 identifiers.
 
@@ -130,12 +131,12 @@ class Graph[VD, ED] {
 }
 {% endhighlight %}
 
-The classes `VertexRDD[VD]` and `EdgeRDD[ED]` extend and are optimized 
versions of `RDD[(VertexID,
+The classes `VertexRDD[VD]` and `EdgeRDD[ED]` extend and are optimized 
versions of `RDD[(VertexId,
 VD)]` and `RDD[Edge[ED]]` respectively.  Both `VertexRDD[VD]` and 
`EdgeRDD[ED]` provide  additional
 functionality built around graph computation and leverage internal 
optimizations.  We discuss the
-`VertexRDD` and `EdgeRDD` API in greater detail in the section on [vertex and 
edge
+`VertexRDD`[VertexRDD] and `EdgeRDD`[EdgeRDD] API in greater detail in the 
section on [vertex and edge
 RDDs](#vertex_and_edge_rdds) but for now they can be thought of as simply RDDs 
of the form:
-`RDD[(VertexID, VD)]` and `RDD[Edge[ED]]`.
+`RDD[(VertexId, VD)]` and `RDD[Edge[ED]]`.
 
 ### Example Property Graph
 
@@ -197,7 +198,7 @@ graph.edges.filter(e => e.srcId > e.dstId).count
 {% endhighlight %}
 
 > Note that `graph.vertices` returns an `VertexRDD[(String, String)]` which 
 > extends
-> `RDD[(VertexID, (String, String))]` and so we use the scala `case` 
expression to deconstruct the
+> `RDD[(VertexId, (String, String))]` and so we use the scala `case` 
expression to deconstruct the
 > tuple.  On the other hand, `graph.edges` returns an `EdgeRDD` containing 
 > `Edge[String]` objects.
 > We could have also used the case class type constructor as in the following:
 > {% highlight scala %}
@@ -287,7 +288,7 @@ class Graph[VD, ED] {
   // Change the partitioning heuristic  

   def partitionBy(partitionStrategy: PartitionStrategy): Graph[VD, ED]
   // Transform vertex and edge attributes 
==
-  def mapVertices[VD2](map: (VertexID, VD) => VD2): Graph[VD2, ED]
+  def mapVertices[VD2](map: (VertexId, VD) => VD2): Graph[VD2, ED]
   def mapEdges[ED2](map: Edge[ED] => ED2): Graph[VD, ED2]
   def mapEdges[ED2](map: (PartitionID, Iterator[Edge[ED]]) => Iterator[ED2]): 
Graph[VD, ED2]
   def mapTriplets[ED2](map: EdgeTriplet[VD, ED] => ED2): Graph[VD, ED2]
@@ -297,18 +298,18 @@ class Graph[VD, ED] {
   def reverse: Graph[VD, ED]
   def

[3/3] spark-website git commit: Add 2.0.2 release.

2016-11-14 Thread rxin

Add 2.0.2 release.


Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/39b5c3d6
Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/39b5c3d6
Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/39b5c3d6

Branch: refs/heads/asf-site
Commit: 39b5c3d65a6a02b97b84cd1e90a08d119ba600e3
Parents: 0bd3631
Author: Reynold Xin 
Authored: Mon Nov 14 17:47:52 2016 -0800
Committer: Reynold Xin 
Committed: Mon Nov 14 17:47:52 2016 -0800

--
 _layouts/global.html|   2 +-
 js/downloads.js |   1 +
 news/_posts/2016-11-14-spark-2-0-2-released.md  |  16 ++
 .../_posts/2016-11-14-spark-release-2-0-2.md|  20 ++
 site/community.html |   8 +-
 site/docs/2.0.2/latest  |   1 +
 site/docs/latest|   2 +-
 site/documentation.html |   8 +-
 site/downloads.html |   8 +-
 site/examples.html  |   8 +-
 site/faq.html   |   8 +-
 site/graphx/index.html  |   8 +-
 site/index.html |   8 +-
 site/js/downloads.js|   1 +
 site/mailing-lists.html |   8 +-
 site/mllib/index.html   |   8 +-
 site/news/amp-camp-2013-registration-ope.html   |   8 +-
 .../news/announcing-the-first-spark-summit.html |   8 +-
 .../news/fourth-spark-screencast-published.html |   8 +-
 site/news/index.html|  18 +-
 site/news/nsdi-paper.html   |   8 +-
 site/news/one-month-to-spark-summit-2015.html   |   8 +-
 .../proposals-open-for-spark-summit-east.html   |   8 +-
 ...registration-open-for-spark-summit-east.html |   8 +-
 .../news/run-spark-and-shark-on-amazon-emr.html |   8 +-
 site/news/spark-0-6-1-and-0-5-2-released.html   |   8 +-
 site/news/spark-0-6-2-released.html |   8 +-
 site/news/spark-0-7-0-released.html |   8 +-
 site/news/spark-0-7-2-released.html |   8 +-
 site/news/spark-0-7-3-released.html |   8 +-
 site/news/spark-0-8-0-released.html |   8 +-
 site/news/spark-0-8-1-released.html |   8 +-
 site/news/spark-0-9-0-released.html |   8 +-
 site/news/spark-0-9-1-released.html |   8 +-
 site/news/spark-0-9-2-released.html |   8 +-
 site/news/spark-1-0-0-released.html |   8 +-
 site/news/spark-1-0-1-released.html |   8 +-
 site/news/spark-1-0-2-released.html |   8 +-
 site/news/spark-1-1-0-released.html |   8 +-
 site/news/spark-1-1-1-released.html |   8 +-
 site/news/spark-1-2-0-released.html |   8 +-
 site/news/spark-1-2-1-released.html |   8 +-
 site/news/spark-1-2-2-released.html |   8 +-
 site/news/spark-1-3-0-released.html |   8 +-
 site/news/spark-1-4-0-released.html |   8 +-
 site/news/spark-1-4-1-released.html |   8 +-
 site/news/spark-1-5-0-released.html |   8 +-
 site/news/spark-1-5-1-released.html |   8 +-
 site/news/spark-1-5-2-released.html |   8 +-
 site/news/spark-1-6-0-released.html |   8 +-
 site/news/spark-1-6-1-released.html |   8 +-
 site/news/spark-1-6-2-released.html |   8 +-
 site/news/spark-1-6-3-released.html |   8 +-
 site/news/spark-2-0-0-released.html |   8 +-
 site/news/spark-2-0-1-released.html |   8 +-
 site/news/spark-2-0-2-released.html | 213 ++
 site/news/spark-2.0.0-preview.html  |   8 +-
 .../spark-accepted-into-apache-incubator.html   |   8 +-
 site/news/spark-and-shark-in-the-news.html  |   8 +-
 site/news/spark-becomes-tlp.html|   8 +-
 site/news/spark-featured-in-wired.html  |   8 +-
 .../spark-mailing-lists-moving-to-apache.html   |   8 +-
 site/news/spark-meetups.html|   8 +-
 site/news/spark-screencasts-published.html  |   8 +-
 site/news/spark-summit-2013-is-a-wrap.html  |   8 +-
 site/news/spark-summit-2014-videos-posted.html  |   8 +-
 site/news/spark-summit-2015-videos-posted.html  |   8 +-
 site/news/spark-summit-agenda-posted.html   |   8 +-
 .../spark-summit-east-2015-videos-posted.html   |   8 +-
 .../spark-summit-east-2016-cfp-closing.html |   8 +-
 site/news/spark-summit-east-agenda-posted.html  |   8 +-
 .../news/spark-summit-europe-agenda-posted.html |   8 +-
 site/news/spark-summit-europe.html  |   8 +-
 .../spark-summit-june-2016-agenda-posted.html   |   8 +-
 site/news/spark-tips-from-quantifind.html

[1/3] spark-website git commit: Add 2.0.2 release.

2016-11-14 Thread rxin

Repository: spark-website
Updated Branches:
  refs/heads/asf-site 0bd363165 -> 39b5c3d65


http://git-wip-us.apache.org/repos/asf/spark-website/blob/39b5c3d6/site/releases/spark-release-1-1-1.html
--
diff --git a/site/releases/spark-release-1-1-1.html 
b/site/releases/spark-release-1-1-1.html
index fcf0c91..434c313 100644
--- a/site/releases/spark-release-1-1-1.html
+++ b/site/releases/spark-release-1-1-1.html
@@ -106,7 +106,7 @@
   Documentation 
 
 
-  Latest Release (Spark 2.0.1)
+  Latest Release (Spark 2.0.2)
   Older Versions and Other 
Resources
 
   
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 2.0.2 
released
+  (Nov 14, 2016)
+
   Spark 1.6.3 
released
   (Nov 07, 2016)
 
@@ -159,9 +162,6 @@
   Spark 2.0.0 
released
   (Jul 26, 2016)
 
-  Spark 1.6.2 
released
-  (Jun 25, 2016)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/39b5c3d6/site/releases/spark-release-1-2-0.html
--
diff --git a/site/releases/spark-release-1-2-0.html 
b/site/releases/spark-release-1-2-0.html
index 0490be7..09e4007 100644
--- a/site/releases/spark-release-1-2-0.html
+++ b/site/releases/spark-release-1-2-0.html
@@ -106,7 +106,7 @@
   Documentation 
 
 
-  Latest Release (Spark 2.0.1)
+  Latest Release (Spark 2.0.2)
   Older Versions and Other 
Resources
 
   
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 2.0.2 
released
+  (Nov 14, 2016)
+
   Spark 1.6.3 
released
   (Nov 07, 2016)
 
@@ -159,9 +162,6 @@
   Spark 2.0.0 
released
   (Jul 26, 2016)
 
-  Spark 1.6.2 
released
-  (Jun 25, 2016)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/39b5c3d6/site/releases/spark-release-1-2-1.html
--
diff --git a/site/releases/spark-release-1-2-1.html 
b/site/releases/spark-release-1-2-1.html
index c9efc6a..93acc9d 100644
--- a/site/releases/spark-release-1-2-1.html
+++ b/site/releases/spark-release-1-2-1.html
@@ -106,7 +106,7 @@
   Documentation 
 
 
-  Latest Release (Spark 2.0.1)
+  Latest Release (Spark 2.0.2)
   Older Versions and Other 
Resources
 
   
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 2.0.2 
released
+  (Nov 14, 2016)
+
   Spark 1.6.3 
released
   (Nov 07, 2016)
 
@@ -159,9 +162,6 @@
   Spark 2.0.0 
released
   (Jul 26, 2016)
 
-  Spark 1.6.2 
released
-  (Jun 25, 2016)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/39b5c3d6/site/releases/spark-release-1-2-2.html
--
diff --git a/site/releases/spark-release-1-2-2.html 
b/site/releases/spark-release-1-2-2.html
index d76c619..32d4627 100644
--- a/site/releases/spark-release-1-2-2.html
+++ b/site/releases/spark-release-1-2-2.html
@@ -106,7 +106,7 @@
   Documentation 
 
 
-  Latest Release (Spark 2.0.1)
+  Latest Release (Spark 2.0.2)
   Older Versions and Other 
Resources
 
   
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 2.0.2 
released
+  (Nov 14, 2016)
+
   Spark 1.6.3 
released
   (Nov 07, 2016)
 
@@ -159,9 +162,6 @@
   Spark 2.0.0 
released
   (Jul 26, 2016)
 
-  Spark 1.6.2 
released
-  (Jun 25, 2016)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/39b5c3d6/site/releases/spark-release-1-3-0.html
--
diff --git a/site/releases/spark-release-1-3-0.html 
b/site/releases/spark-release-1-3-0.html
index 435ed19..45180a7 100644
--- a/site/releases/spark-release-1-3-0.html
+++ b/site/releases/spark-release-1-3-0.html
@@ -106,7 +106,7 @@
   Documentation 
 
 
-  Latest Release (Spark 2.0.1)
+  Latest Release (Spark 2.0.2)
   Older Versions and Other 
Resources
 
   
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 2.0.2 
released
+  (Nov 14, 2016)
+
   Spark 1.6.3 
released
   (Nov 07, 2016)
 
@@ -159,9 +162,6 @@
   Spark 2.0.0 
released
   (Jul 26, 2016)
 
-  Spark 1.6.2 
released
-  (Jun 25, 2016)
-

[2/3] spark-website git commit: Add 2.0.2 release.

2016-11-14 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/39b5c3d6/site/news/spark-2-0-1-released.html
--
diff --git a/site/news/spark-2-0-1-released.html 
b/site/news/spark-2-0-1-released.html
index f772398..09e052d 100644
--- a/site/news/spark-2-0-1-released.html
+++ b/site/news/spark-2-0-1-released.html
@@ -106,7 +106,7 @@
   Documentation 
 
 
-  Latest Release (Spark 2.0.1)
+  Latest Release (Spark 2.0.2)
   Older Versions and Other 
Resources
 
   
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 2.0.2 
released
+  (Nov 14, 2016)
+
   Spark 1.6.3 
released
   (Nov 07, 2016)
 
@@ -159,9 +162,6 @@
   Spark 2.0.0 
released
   (Jul 26, 2016)
 
-  Spark 1.6.2 
released
-  (Jun 25, 2016)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/39b5c3d6/site/news/spark-2-0-2-released.html
--
diff --git a/site/news/spark-2-0-2-released.html 
b/site/news/spark-2-0-2-released.html
new file mode 100644
index 000..0b1ffe7
--- /dev/null
+++ b/site/news/spark-2-0-2-released.html
@@ -0,0 +1,213 @@
+
+
+
+  
+  
+  
+
+  
+ Spark 2.0.2 released | Apache Spark
+
+  
+
+  
+
+  
+
+  
+  
+  
+
+  
+  
+
+  
+  
+  var _gaq = _gaq || [];
+  _gaq.push(['_setAccount', 'UA-32518208-2']);
+  _gaq.push(['_trackPageview']);
+  (function() {
+var ga = document.createElement('script'); ga.type = 'text/javascript'; 
ga.async = true;
+ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 
'http://www') + '.google-analytics.com/ga.js';
+var s = document.getElementsByTagName('script')[0]; 
s.parentNode.insertBefore(ga, s);
+  })();
+
+  
+  function trackOutboundLink(link, category, action) {
+try {
+  _gaq.push(['_trackEvent', category , action]);
+} catch(err){}
+
+setTimeout(function() {
+  document.location.href = link.href;
+}, 100);
+  }
+  
+
+  
+  
+
+
+
+
+https://code.jquery.com/jquery.js";>
+
+
+
+
+
+
+
+  
+
+  
+  
+  Lightning-fast cluster computing
+  
+
+  
+
+
+
+  
+  
+
+  Toggle navigation
+  
+  
+  
+
+  
+
+  
+  
+
+  Download
+  
+
+  Libraries 
+
+
+  SQL and DataFrames
+  Spark Streaming
+  MLlib (machine learning)
+  GraphX (graph)
+  
+  https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects;>Third-Party
 Packages
+
+  
+  
+
+  Documentation 
+
+
+  Latest Release (Spark 2.0.2)
+  Older Versions and Other 
Resources
+
+  
+  Examples
+  
+
+  Community 
+
+
+  Mailing Lists
+  Events and Meetups
+  Project History
+  https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark;>Powered
 By
+  https://cwiki.apache.org/confluence/display/SPARK/Committers;>Project 
Committers
+  https://issues.apache.org/jira/browse/SPARK;>Issue 
Tracker
+
+  
+  FAQ
+
+
+  
+http://www.apache.org/; class="dropdown-toggle" 
data-toggle="dropdown">
+  Apache Software Foundation 
+
+  http://www.apache.org/;>Apache Homepage
+  http://www.apache.org/licenses/;>License
+  http://www.apache.org/foundation/sponsorship.html;>Sponsorship
+  http://www.apache.org/foundation/thanks.html;>Thanks
+  http://www.apache.org/security/;>Security
+
+  
+
+  
+  
+
+
+
+
+  
+
+  Latest News
+  
+
+  Spark 2.0.2 
released
+  (Nov 14, 2016)
+
+  Spark 1.6.3 
released
+  (Nov 07, 2016)
+
+  Spark 2.0.1 
released
+  (Oct 03, 2016)
+
+  Spark 2.0.0 
released
+  (Jul 26, 2016)
+
+  
+  Archive
+
+
+  
+Download Spark
+  
+  
+Built-in Libraries:
+  
+  
+SQL and DataFrames
+Spark Streaming
+MLlib (machine learning)
+GraphX (graph)
+  
+  https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects;>Third-Party
 Packages
+
+  
+
+  
+Spark 2.0.2 released
+
+
+We are happy to announce the availability of Apache 
Spark 2.0.2! This maintenance release includes fixes across several areas 
of Spark, as well as Kafka 0.10 and runtime metrics support for Structured 
Streaming.
+
+Visit the release notes to read

spark git commit: [SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Structured Streaming Programming Guide

2016-11-13 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 80c1a1f30 -> a719c5128


[SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Structured 
Streaming Programming Guide

Update the python section of the Structured Streaming Guide from .builder() to 
.builder

Validated documentation and successfully running the test example.

Please review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before 
opening a pull request.

'Builder' object is not callable object hence changed .builder() to
.builder

Author: Denny Lee 

Closes #15872 from dennyglee/master.

(cherry picked from commit b91a51bb231af321860415075a7f404bc46e0a74)
Signed-off-by: Reynold Xin 
(cherry picked from commit b6e4d3925239836334867d6ebcf22e5a1369cfc0)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a719c512
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a719c512
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a719c512

Branch: refs/heads/branch-2.0
Commit: a719c5128fddb133c5d496b85032e0049506a95c
Parents: 80c1a1f
Author: Denny Lee 
Authored: Sun Nov 13 18:10:06 2016 -0800
Committer: Reynold Xin 
Committed: Sun Nov 13 18:11:59 2016 -0800

--
 docs/structured-streaming-programming-guide.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a719c512/docs/structured-streaming-programming-guide.md
--
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index be730b8..537aa06 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -59,9 +59,9 @@ from pyspark.sql import SparkSession
 from pyspark.sql.functions import explode
 from pyspark.sql.functions import split
 
-spark = SparkSession\
-.builder()\
-.appName("StructuredNetworkWordCount")\
+spark = SparkSession \
+.builder \
+.appName("StructuredNetworkWordCount") \
 .getOrCreate()
 {% endhighlight %}
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Structured Streaming Programming Guide

2016-11-13 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 6fae4241f -> 0c69224ed


[SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Structured 
Streaming Programming Guide

## What changes were proposed in this pull request?

Update the python section of the Structured Streaming Guide from .builder() to 
.builder

## How was this patch tested?

Validated documentation and successfully running the test example.

Please review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before 
opening a pull request.

'Builder' object is not callable object hence changed .builder() to
.builder

Author: Denny Lee 

Closes #15872 from dennyglee/master.

(cherry picked from commit b91a51bb231af321860415075a7f404bc46e0a74)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0c69224e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0c69224e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0c69224e

Branch: refs/heads/branch-2.1
Commit: 0c69224ed752c25be1545cfe8ba0db8487a70bf2
Parents: 6fae424
Author: Denny Lee 
Authored: Sun Nov 13 18:10:06 2016 -0800
Committer: Reynold Xin 
Committed: Sun Nov 13 18:10:16 2016 -0800

--
 docs/structured-streaming-programming-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0c69224e/docs/structured-streaming-programming-guide.md
--
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index d838ed3..d254558 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -58,7 +58,7 @@ from pyspark.sql.functions import explode
 from pyspark.sql.functions import split
 
 spark = SparkSession \
-.builder() \
+.builder \
 .appName("StructuredNetworkWordCount") \
 .getOrCreate()
 {% endhighlight %}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Structured Streaming Programming Guide

2016-11-13 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 1386fd28d -> b91a51bb2


[SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Structured 
Streaming Programming Guide

## What changes were proposed in this pull request?

Update the python section of the Structured Streaming Guide from .builder() to 
.builder

## How was this patch tested?

Validated documentation and successfully running the test example.

Please review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before 
opening a pull request.

'Builder' object is not callable object hence changed .builder() to
.builder

Author: Denny Lee 

Closes #15872 from dennyglee/master.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b91a51bb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b91a51bb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b91a51bb

Branch: refs/heads/master
Commit: b91a51bb231af321860415075a7f404bc46e0a74
Parents: 1386fd2
Author: Denny Lee 
Authored: Sun Nov 13 18:10:06 2016 -0800
Committer: Reynold Xin 
Committed: Sun Nov 13 18:10:06 2016 -0800

--
 docs/structured-streaming-programming-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b91a51bb/docs/structured-streaming-programming-guide.md
--
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index d838ed3..d254558 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -58,7 +58,7 @@ from pyspark.sql.functions import explode
 from pyspark.sql.functions import split
 
 spark = SparkSession \
-.builder() \
+.builder \
 .appName("StructuredNetworkWordCount") \
 .getOrCreate()
 {% endhighlight %}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r16970 - /dev/spark/spark-2.0.2/ /release/spark/spark-2.0.2/

2016-11-11 Thread rxin

Author: rxin
Date: Fri Nov 11 22:51:28 2016
New Revision: 16970

Log:
Artifacts for Spark 2.0.2

Added:
release/spark/spark-2.0.2/
  - copied from r16969, dev/spark/spark-2.0.2/
Removed:
dev/spark/spark-2.0.2/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[26/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/tableToDF.html
--
diff --git a/site/docs/2.0.2/api/R/tableToDF.html 
b/site/docs/2.0.2/api/R/tableToDF.html
new file mode 100644
index 000..6a11caa
--- /dev/null
+++ b/site/docs/2.0.2/api/R/tableToDF.html
@@ -0,0 +1,65 @@
+
+R: Create a SparkDataFrame from a SparkSQL Table
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+tableToDF 
{SparkR}R Documentation
+
+Create a SparkDataFrame from a SparkSQL Table
+
+Description
+
+Returns the specified Table as a SparkDataFrame.  The Table must have 
already been registered
+in the SparkSession.
+
+
+
+Usage
+
+
+tableToDF(tableName)
+
+
+
+Arguments
+
+
+tableName
+
+The SparkSQL Table to convert to a SparkDataFrame.
+
+
+
+
+Value
+
+SparkDataFrame
+
+
+
+Note
+
+tableToDF since 2.0.0
+
+
+
+Examples
+
+## Not run: 
+##D sparkR.session()
+##D path - path/to/file.json
+##D df - read.json(path)
+##D createOrReplaceTempView(df, table)
+##D new_df - tableToDF(table)
+## End(Not run)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/tables.html
--
diff --git a/site/docs/2.0.2/api/R/tables.html 
b/site/docs/2.0.2/api/R/tables.html
new file mode 100644
index 000..e486018
--- /dev/null
+++ b/site/docs/2.0.2/api/R/tables.html
@@ -0,0 +1,62 @@
+
+R: Tables
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+tables 
{SparkR}R Documentation
+
+Tables
+
+Description
+
+Returns a SparkDataFrame containing names of tables in the given database.
+
+
+
+Usage
+
+
+## Default S3 method:
+tables(databaseName = NULL)
+
+
+
+Arguments
+
+
+databaseName
+
+name of the database
+
+
+
+
+Value
+
+a SparkDataFrame
+
+
+
+Note
+
+tables since 1.4.0
+
+
+
+Examples
+
+## Not run: 
+##D sparkR.session()
+##D tables(hive)
+## End(Not run)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/take.html
--
diff --git a/site/docs/2.0.2/api/R/take.html b/site/docs/2.0.2/api/R/take.html
new file mode 100644
index 000..b792543
--- /dev/null
+++ b/site/docs/2.0.2/api/R/take.html
@@ -0,0 +1,262 @@
+
+R: Take the first NUM rows of a SparkDataFrame and return 
the...
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+take 
{SparkR}R Documentation
+
+Take the first NUM rows of a SparkDataFrame and return the results as a R 
data.frame
+
+Description
+
+Take the first NUM rows of a SparkDataFrame and return the results as a R 
data.frame
+
+
+
+Usage
+
+
+## S4 method for signature 'SparkDataFrame,numeric'
+take(x, num)
+
+take(x, num)
+
+
+
+Arguments
+
+
+x
+
+a SparkDataFrame.
+
+num
+
+number of rows to take.
+
+
+
+
+Note
+
+take since 1.4.0
+
+
+
+See Also
+
+Other SparkDataFrame functions: $,
+$,SparkDataFrame-method, $-,
+$-,SparkDataFrame-method,
+select, select,
+select,SparkDataFrame,Column-method,
+select,SparkDataFrame,character-method,
+select,SparkDataFrame,list-method;
+SparkDataFrame-class; [,
+[,SparkDataFrame-method, [[,
+[[,SparkDataFrame,numericOrcharacter-method,
+subset, subset,
+subset,SparkDataFrame-method;
+agg, agg, agg,
+agg,GroupedData-method,
+agg,SparkDataFrame-method,
+summarize, summarize,
+summarize,
+summarize,GroupedData-method,
+summarize,SparkDataFrame-method;
+arrange, arrange,
+arrange,
+arrange,SparkDataFrame,Column-method,
+arrange,SparkDataFrame,character-method,
+orderBy,SparkDataFrame,characterOrColumn-method;
+as.data.frame,
+as.data.frame,SparkDataFrame-method;
+attach,
+attach,SparkDataFrame-method;
+cache, cache,
+cache,SparkDataFrame-method;
+collect, collect,
+collect,SparkDataFrame-method;
+colnames, colnames,
+colnames,SparkDataFrame-method,
+colnames-, colnames-,
+colnames-,SparkDataFrame-method,
+columns, columns,
+columns,SparkDataFrame-method,
+names,
+names,SparkDataFrame-method,
+names-,
+names-,SparkDataFrame-method;
+coltypes, coltypes,
+coltypes,SparkDataFrame-method,
+coltypes-, coltypes-,
+coltypes-,SparkDataFrame,character-method;

[21/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/allclasses-frame.html
--
diff --git a/site/docs/2.0.2/api/java/allclasses-frame.html 
b/site/docs/2.0.2/api/java/allclasses-frame.html
new file mode 100644
index 000..d3f2f4d
--- /dev/null
+++ b/site/docs/2.0.2/api/java/allclasses-frame.html
@@ -0,0 +1,1119 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+All Classes (Spark 2.0.2 JavaDoc)
+
+
+
+
+All Classes
+
+
+AbsoluteError
+Accumulable
+AccumulableInfo
+AccumulableInfo
+AccumulableParam
+Accumulator
+AccumulatorContext
+AccumulatorParam
+AccumulatorParam.DoubleAccumulatorParam$
+AccumulatorParam.FloatAccumulatorParam$
+AccumulatorParam.IntAccumulatorParam$
+AccumulatorParam.LongAccumulatorParam$
+AccumulatorParam.StringAccumulatorParam$
+AccumulatorV2
+AFTAggregator
+AFTCostFun
+AFTSurvivalRegression
+AFTSurvivalRegressionModel
+AggregatedDialect
+AggregatingEdgeContext
+Aggregator
+Aggregator
+Algo
+AllJobsCancelled
+AllReceiverIds
+ALS
+ALS
+ALS.InBlock$
+ALS.Rating
+ALS.Rating$
+ALS.RatingBlock$
+ALSModel
+AnalysisException
+And
+AnyDataType
+ApplicationAttemptInfo
+ApplicationInfo
+ApplicationsListResource
+ApplicationStatus
+ApplyInPlace
+AreaUnderCurve
+ArrayType
+AskPermissionToCommitOutput
+AssociationRules
+AssociationRules.Rule
+AsyncRDDActions
+Attribute
+AttributeGroup
+AttributeKeys
+AttributeType
+BaseRelation
+BaseRRDD
+BatchInfo
+BernoulliCellSampler
+BernoulliSampler
+Binarizer
+BinaryAttribute
+BinaryClassificationEvaluator
+BinaryClassificationMetrics
+BinaryLogisticRegressionSummary
+BinaryLogisticRegressionTrainingSummary
+BinarySample
+BinaryType
+BinomialBounds
+BisectingKMeans
+BisectingKMeans
+BisectingKMeansModel
+BisectingKMeansModel
+BisectingKMeansModel.SaveLoadV1_0$
+BLAS
+BLAS
+BlockId
+BlockManagerId
+BlockManagerMessages
+BlockManagerMessages.BlockManagerHeartbeat
+BlockManagerMessages.BlockManagerHeartbeat$
+BlockManagerMessages.GetBlockStatus
+BlockManagerMessages.GetBlockStatus$
+BlockManagerMessages.GetExecutorEndpointRef
+BlockManagerMessages.GetExecutorEndpointRef$
+BlockManagerMessages.GetLocations
+BlockManagerMessages.GetLocations$
+BlockManagerMessages.GetLocationsMultipleBlockIds
+BlockManagerMessages.GetLocationsMultipleBlockIds$
+BlockManagerMessages.GetMatchingBlockIds
+BlockManagerMessages.GetMatchingBlockIds$
+BlockManagerMessages.GetMemoryStatus$
+BlockManagerMessages.GetPeers
+BlockManagerMessages.GetPeers$
+BlockManagerMessages.GetStorageStatus$
+BlockManagerMessages.HasCachedBlocks
+BlockManagerMessages.HasCachedBlocks$
+BlockManagerMessages.RegisterBlockManager
+BlockManagerMessages.RegisterBlockManager$
+BlockManagerMessages.RemoveBlock
+BlockManagerMessages.RemoveBlock$
+BlockManagerMessages.RemoveBroadcast
+BlockManagerMessages.RemoveBroadcast$
+BlockManagerMessages.RemoveExecutor
+BlockManagerMessages.RemoveExecutor$
+BlockManagerMessages.RemoveRdd
+BlockManagerMessages.RemoveRdd$
+BlockManagerMessages.RemoveShuffle
+BlockManagerMessages.RemoveShuffle$
+BlockManagerMessages.StopBlockManagerMaster$
+BlockManagerMessages.ToBlockManagerMaster
+BlockManagerMessages.ToBlockManagerSlave
+BlockManagerMessages.TriggerThreadDump$
+BlockManagerMessages.UpdateBlockInfo
+BlockManagerMessages.UpdateBlockInfo$
+BlockMatrix
+BlockNotFoundException
+BlockStatus
+BlockUpdatedInfo
+BloomFilter
+BloomFilter.Version
+BooleanParam
+BooleanType
+BoostingStrategy
+BoundedDouble
+BreezeUtil
+Broadcast
+BroadcastBlockId
+Broker
+Bucketizer
+BufferReleasingInputStream
+BytecodeUtils
+ByteType
+CalendarIntervalType
+Catalog
+CatalogImpl
+CatalystScan
+CategoricalSplit
+CausedBy
+CheckpointReader
+CheckpointState
+ChiSqSelector
+ChiSqSelector
+ChiSqSelectorModel
+ChiSqSelectorModel
+ChiSqSelectorModel.SaveLoadV1_0$
+ChiSqTest
+ChiSqTest.Method
+ChiSqTest.Method$
+ChiSqTest.NullHypothesis$
+ChiSqTestResult
+CholeskyDecomposition
+ChunkedByteBufferInputStream
+ClassificationModel
+ClassificationModel
+Classifier
+CleanAccum
+CleanBroadcast
+CleanCheckpoint
+CleanRDD
+CleanShuffle
+CleanupTask
+CleanupTaskWeakReference
+ClosureCleaner
+CoarseGrainedClusterMessages
+CoarseGrainedClusterMessages.AddWebUIFilter
+CoarseGrainedClusterMessages.AddWebUIFilter$
+CoarseGrainedClusterMessages.GetExecutorLossReason
+CoarseGrainedClusterMessages.GetExecutorLossReason$
+CoarseGrainedClusterMessages.KillExecutors
+CoarseGrainedClusterMessages.KillExecutors$
+CoarseGrainedClusterMessages.KillTask
+CoarseGrainedClusterMessages.KillTask$
+CoarseGrainedClusterMessages.LaunchTask
+CoarseGrainedClusterMessages.LaunchTask$
+CoarseGrainedClusterMessages.RegisterClusterManager
+CoarseGrainedClusterMessages.RegisterClusterManager$
+CoarseGrainedClusterMessages.RegisteredExecutor$
+CoarseGrainedClusterMessages.RegisterExecutor
+CoarseGrainedClusterMessages.RegisterExecutor$
+CoarseGrainedClusterMessages.RegisterExecutorFailed
+CoarseGrainedClusterMessages.RegisterExecutorFailed$

[44/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/dapply.html
--
diff --git a/site/docs/2.0.2/api/R/dapply.html 
b/site/docs/2.0.2/api/R/dapply.html
new file mode 100644
index 000..2bac687
--- /dev/null
+++ b/site/docs/2.0.2/api/R/dapply.html
@@ -0,0 +1,290 @@
+
+R: dapply
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+dapply 
{SparkR}R Documentation
+
+dapply
+
+Description
+
+Apply a function to each partition of a SparkDataFrame.
+
+
+
+Usage
+
+
+## S4 method for signature 'SparkDataFrame,'function',structType'
+dapply(x, func, schema)
+
+dapply(x, func, schema)
+
+
+
+Arguments
+
+
+x
+
+A SparkDataFrame
+
+func
+
+A function to be applied to each partition of the SparkDataFrame.
+func should have only one parameter, to which a R data.frame corresponds
+to each partition will be passed.
+The output of func should be a R data.frame.
+
+schema
+
+The schema of the resulting SparkDataFrame after the function is applied.
+It must match the output of func.
+
+
+
+
+Note
+
+dapply since 2.0.0
+
+
+
+See Also
+
+dapplyCollect
+
+Other SparkDataFrame functions: $,
+$,SparkDataFrame-method, $-,
+$-,SparkDataFrame-method,
+select, select,
+select,SparkDataFrame,Column-method,
+select,SparkDataFrame,character-method,
+select,SparkDataFrame,list-method;
+SparkDataFrame-class; [,
+[,SparkDataFrame-method, [[,
+[[,SparkDataFrame,numericOrcharacter-method,
+subset, subset,
+subset,SparkDataFrame-method;
+agg, agg, agg,
+agg,GroupedData-method,
+agg,SparkDataFrame-method,
+summarize, summarize,
+summarize,
+summarize,GroupedData-method,
+summarize,SparkDataFrame-method;
+arrange, arrange,
+arrange,
+arrange,SparkDataFrame,Column-method,
+arrange,SparkDataFrame,character-method,
+orderBy,SparkDataFrame,characterOrColumn-method;
+as.data.frame,
+as.data.frame,SparkDataFrame-method;
+attach,
+attach,SparkDataFrame-method;
+cache, cache,
+cache,SparkDataFrame-method;
+collect, collect,
+collect,SparkDataFrame-method;
+colnames, colnames,
+colnames,SparkDataFrame-method,
+colnames-, colnames-,
+colnames-,SparkDataFrame-method,
+columns, columns,
+columns,SparkDataFrame-method,
+names,
+names,SparkDataFrame-method,
+names-,
+names-,SparkDataFrame-method;
+coltypes, coltypes,
+coltypes,SparkDataFrame-method,
+coltypes-, coltypes-,
+coltypes-,SparkDataFrame,character-method;
+count,SparkDataFrame-method,
+nrow, nrow,
+nrow,SparkDataFrame-method;
+createOrReplaceTempView,
+createOrReplaceTempView,
+createOrReplaceTempView,SparkDataFrame,character-method;
+dapplyCollect, dapplyCollect,
+dapplyCollect,SparkDataFrame,function-method;
+describe, describe,
+describe,
+describe,SparkDataFrame,ANY-method,
+describe,SparkDataFrame,character-method,
+describe,SparkDataFrame-method,
+summary, summary,
+summary,SparkDataFrame-method;
+dim,
+dim,SparkDataFrame-method;
+distinct, distinct,
+distinct,SparkDataFrame-method,
+unique,
+unique,SparkDataFrame-method;
+dropDuplicates,
+dropDuplicates,
+dropDuplicates,SparkDataFrame-method;
+dropna, dropna,
+dropna,SparkDataFrame-method,
+fillna, fillna,
+fillna,SparkDataFrame-method,
+na.omit, na.omit,
+na.omit,SparkDataFrame-method;
+drop, drop,
+drop, drop,ANY-method,
+drop,SparkDataFrame-method;
+dtypes, dtypes,
+dtypes,SparkDataFrame-method;
+except, except,
+except,SparkDataFrame,SparkDataFrame-method;
+explain, explain,
+explain,SparkDataFrame-method;
+filter, filter,
+filter,SparkDataFrame,characterOrColumn-method,
+where, where,
+where,SparkDataFrame,characterOrColumn-method;
+first, first,
+first,
+first,SparkDataFrame-method,
+first,characterOrColumn-method;
+gapplyCollect, gapplyCollect,
+gapplyCollect,
+gapplyCollect,GroupedData-method,
+gapplyCollect,SparkDataFrame-method;
+gapply, gapply,
+gapply,
+gapply,GroupedData-method,
+gapply,SparkDataFrame-method;
+groupBy, groupBy,
+groupBy,SparkDataFrame-method,
+group_by, group_by,
+group_by,SparkDataFrame-method;
+head,
+head,SparkDataFrame-method;
+histogram,
+histogram,SparkDataFrame,characterOrColumn-method;
+insertInto, insertInto,
+insertInto,SparkDataFrame,character-method;
+intersect, intersect,
+intersect,SparkDataFrame,SparkDataFrame-method;
+isLocal, isLocal,
+isLocal,SparkDataFrame-method;
+join,
+join,SparkDataFrame,SparkDataFrame-method;
+limit, limit,
+limit,SparkDataFrame,numeric-method;
+merge, merge,
+merge,SparkDataFrame,SparkDataFrame-method;
+mutate, mutate,
+mutate,SparkDataFrame-method,
+transform, transform,
+transform,SparkDataFrame-method;
+ncol,
+ncol,SparkDataFrame-method;
+persist, persist,
+persist,SparkDataFrame,character-method;
+printSchema, printSchema,
+printSchema,SparkDataFrame-method;
+randomSplit, randomSplit,

[47/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/avg.html
--
diff --git a/site/docs/2.0.2/api/R/avg.html b/site/docs/2.0.2/api/R/avg.html
new file mode 100644
index 000..b146502
--- /dev/null
+++ b/site/docs/2.0.2/api/R/avg.html
@@ -0,0 +1,109 @@
+
+R: avg
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+avg 
{SparkR}R Documentation
+
+avg
+
+Description
+
+Aggregate function: returns the average of the values in a group.
+
+
+
+Usage
+
+
+## S4 method for signature 'Column'
+avg(x)
+
+avg(x, ...)
+
+
+
+Arguments
+
+
+x
+
+Column to compute on or a GroupedData object.
+
+...
+
+additional argument(s) when x is a GroupedData object.
+
+
+
+
+Note
+
+avg since 1.4.0
+
+
+
+See Also
+
+Other agg_funcs: agg, agg,
+agg, agg,GroupedData-method,
+agg,SparkDataFrame-method,
+summarize, summarize,
+summarize,
+summarize,GroupedData-method,
+summarize,SparkDataFrame-method;
+countDistinct, countDistinct,
+countDistinct,Column-method,
+n_distinct, n_distinct,
+n_distinct,Column-method;
+count, count,
+count,Column-method,
+count,GroupedData-method, n,
+n, n,Column-method;
+first, first,
+first,
+first,SparkDataFrame-method,
+first,characterOrColumn-method;
+kurtosis, kurtosis,
+kurtosis,Column-method; last,
+last,
+last,characterOrColumn-method;
+max, max,Column-method;
+mean, mean,Column-method;
+min, min,Column-method;
+sd, sd,
+sd,Column-method, stddev,
+stddev, stddev,Column-method;
+skewness, skewness,
+skewness,Column-method;
+stddev_pop, stddev_pop,
+stddev_pop,Column-method;
+stddev_samp, stddev_samp,
+stddev_samp,Column-method;
+sumDistinct, sumDistinct,
+sumDistinct,Column-method;
+sum, sum,Column-method;
+var_pop, var_pop,
+var_pop,Column-method;
+var_samp, var_samp,
+var_samp,Column-method; var,
+var, var,Column-method,
+variance, variance,
+variance,Column-method
+
+
+
+Examples
+
+## Not run: avg(df$c)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/base64.html
--
diff --git a/site/docs/2.0.2/api/R/base64.html 
b/site/docs/2.0.2/api/R/base64.html
new file mode 100644
index 000..736a1fd
--- /dev/null
+++ b/site/docs/2.0.2/api/R/base64.html
@@ -0,0 +1,115 @@
+
+R: base64
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+base64 
{SparkR}R Documentation
+
+base64
+
+Description
+
+Computes the BASE64 encoding of a binary column and returns it as a string 
column.
+This is the reverse of unbase64.
+
+
+
+Usage
+
+
+## S4 method for signature 'Column'
+base64(x)
+
+base64(x)
+
+
+
+Arguments
+
+
+x
+
+Column to compute on.
+
+
+
+
+Note
+
+base64 since 1.5.0
+
+
+
+See Also
+
+Other string_funcs: ascii,
+ascii, ascii,Column-method;
+concat_ws, concat_ws,
+concat_ws,character,Column-method;
+concat, concat,
+concat,Column-method; decode,
+decode,
+decode,Column,character-method;
+encode, encode,
+encode,Column,character-method;
+format_number, format_number,
+format_number,Column,numeric-method;
+format_string, format_string,
+format_string,character,Column-method;
+initcap, initcap,
+initcap,Column-method; instr,
+instr,
+instr,Column,character-method;
+length, length,Column-method;
+levenshtein, levenshtein,
+levenshtein,Column-method;
+locate, locate,
+locate,character,Column-method;
+lower, lower,
+lower,Column-method; lpad,
+lpad,
+lpad,Column,numeric,character-method;
+ltrim, ltrim,
+ltrim,Column-method;
+regexp_extract,
+regexp_extract,
+regexp_extract,Column,character,numeric-method;
+regexp_replace,
+regexp_replace,
+regexp_replace,Column,character,character-method;
+reverse, reverse,
+reverse,Column-method; rpad,
+rpad,
+rpad,Column,numeric,character-method;
+rtrim, rtrim,
+rtrim,Column-method; soundex,
+soundex,
+soundex,Column-method;
+substring_index,
+substring_index,
+substring_index,Column,character,numeric-method;
+translate, translate,
+translate,Column,character,character-method;
+trim, trim,
+trim,Column-method; unbase64,
+unbase64,
+unbase64,Column-method;
+upper, upper,
+upper,Column-method
+
+
+
+Examples
+
+## Not run: base64(df$c)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/between.html
--
diff --git a/site/docs/2.0.2/api/R/between.html 
b/site/docs/2.0.2/api/R/between.html

[15/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/Accumulable.html
--
diff --git a/site/docs/2.0.2/api/java/org/apache/spark/Accumulable.html 
b/site/docs/2.0.2/api/java/org/apache/spark/Accumulable.html
new file mode 100644
index 000..742e046
--- /dev/null
+++ b/site/docs/2.0.2/api/java/org/apache/spark/Accumulable.html
@@ -0,0 +1,456 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+Accumulable (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+
+org.apache.spark
+Class AccumulableR,T
+
+
+
+Object
+
+
+org.apache.spark.AccumulableR,T
+
+
+
+
+
+
+
+All Implemented Interfaces:
+java.io.Serializable
+
+
+Direct Known Subclasses:
+Accumulator
+
+
+Deprecated.
+use AccumulatorV2. Since 2.0.0.
+
+
+public class AccumulableR,T
+extends Object
+implements java.io.Serializable
+A data type that can be accumulated, i.e. has a commutative 
and associative "add" operation,
+ but where the result type, R, may be different from the element 
type being added, T.
+ 
+ You must define how to add data, and how to merge two of these together.  For 
some data types,
+ such as a counter, these might be the same operation. In that case, you can 
use the simpler
+ Accumulator. They won't always be the same, 
though -- e.g., imagine you are
+ accumulating a set. You will add items to the set, and you will union two 
sets together.
+ 
+ Operations are not thread-safe.
+ 
+ param:  id ID of this accumulator; for internal use only.
+ param:  initialValue initial value of accumulator
+ param:  param helper object defining how to add elements of type 
R and T
+ param:  name human-readable name for use in Spark's web UI
+ param:  countFailedValues whether to accumulate values from failed tasks. 
This is set to true
+  for system and time metrics like serialization time 
or bytes spilled,
+  and false for things with absolute values like 
number of input rows.
+  This should be used for internal metrics only.
+See Also:Serialized 
Form
+
+
+
+
+
+
+
+
+
+
+
+Constructor Summary
+
+Constructors
+
+Constructor and Description
+
+
+Accumulable(RinitialValue,
+   AccumulableParamR,Tparam)
+Deprecated.
+
+
+
+
+
+
+
+
+
+
+Method Summary
+
+Methods
+
+Modifier and Type
+Method and Description
+
+
+void
+add(Tterm)
+Deprecated.
+Add more data to this accumulator / accumulable
+
+
+
+long
+id()
+Deprecated.
+
+
+
+R
+localValue()
+Deprecated.
+Get the current value of this accumulator from within a 
task.
+
+
+
+void
+merge(Rterm)
+Deprecated.
+Merge two accumulable objects together
+
+
+
+scala.OptionString
+name()
+Deprecated.
+
+
+
+void
+setValue(RnewValue)
+Deprecated.
+Set the accumulator's value.
+
+
+
+String
+toString()
+Deprecated.
+
+
+
+R
+value()
+Deprecated.
+Access the accumulator's current value; only allowed on 
driver.
+
+
+
+R
+zero()
+Deprecated.
+
+
+
+
+
+
+
+Methods inherited from classObject
+equals, getClass, hashCode, notify, notifyAll, wait, wait, 
wait
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Constructor Detail
+
+
+
+
+
+
+
+Accumulable
+publicAccumulable(RinitialValue,
+   AccumulableParamR,Tparam)
+Deprecated.
+
+
+
+
+
+
+
+
+
+Method Detail
+
+
+
+
+
+id
+publiclongid()
+Deprecated.
+
+
+
+
+
+
+
+name
+publicscala.OptionStringname()
+Deprecated.
+
+
+
+
+
+
+
+zero
+publicRzero()
+Deprecated.
+
+
+
+
+
+
+
+
+
+add
+publicvoidadd(Tterm)
+Deprecated.
+Add more data to this accumulator / accumulable
+Parameters:term - 
the data to add
+
+
+
+
+
+
+
+
+
+merge
+publicvoidmerge(Rterm)
+Deprecated.
+Merge two accumulable objects together
+ 
+ Normally, a user will not want to use this version, but will instead call 
add.
+Parameters:term - 
the other R that will get merged with this
+
+
+
+
+
+
+
+value
+publicRvalue()
+Deprecated.
+Access the accumulator's current value; only allowed on 
driver.
+Returns:(undocumented)
+
+
+
+
+
+
+
+localValue
+publicRlocalValue()
+Deprecated.
+Get the current value of this accumulator from within a 
task.
+ 
+ This is NOT the global value of the accumulator.  To get the global value 
after a
+ completed operation on the dataset, call value.
+ 
+ The typical use of this method is to directly mutate the local value, eg., to 
add
+ an element

[18/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/index-all.html
--
diff --git a/site/docs/2.0.2/api/java/index-all.html 
b/site/docs/2.0.2/api/java/index-all.html
new file mode 100644
index 000..12e185e
--- /dev/null
+++ b/site/docs/2.0.2/api/java/index-all.html
@@ -0,0 +1,45389 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+Index (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev
+Next
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+
+
+$ABCDEFGHIJKLMNOPQRSTUVWXYZ_
+
+
+$
+
+$colon$bslash(B,
 Function2A, B, B) - Static method in class 
org.apache.spark.sql.types.StructType
+
+$colon$plus(B,
 CanBuildFromRepr, B, That) - Static method in class 
org.apache.spark.sql.types.StructType
+
+$div$colon(B,
 Function2B, A, B) - Static method in class 
org.apache.spark.sql.types.StructType
+
+$greater(A)
 - Static method in class org.apache.spark.sql.types.Decimal
+
+$greater(A)
 - Static method in class org.apache.spark.storage.RDDInfo
+
+$greater$eq(A)
 - Static method in class org.apache.spark.sql.types.Decimal
+
+$greater$eq(A)
 - Static method in class org.apache.spark.storage.RDDInfo
+
+$less(A) - 
Static method in class org.apache.spark.sql.types.Decimal
+
+$less(A) - 
Static method in class org.apache.spark.storage.RDDInfo
+
+$less$eq(A)
 - Static method in class org.apache.spark.sql.types.Decimal
+
+$less$eq(A)
 - Static method in class org.apache.spark.storage.RDDInfo
+
+$minus$greater(T)
 - Static method in class org.apache.spark.ml.param.DoubleParam
+
+$minus$greater(T)
 - Static method in class org.apache.spark.ml.param.FloatParam
+
+$plus$colon(B,
 CanBuildFromRepr, B, That) - Static method in class 
org.apache.spark.sql.types.StructType
+
+$plus$eq(T) - 
Static method in class org.apache.spark.Accumulator
+
+Deprecated.
+
+$plus$plus(RDDT)
 - Static method in class org.apache.spark.api.r.RRDD
+
+$plus$plus(RDDT)
 - Static method in class org.apache.spark.graphx.EdgeRDD
+
+$plus$plus(RDDT)
 - Static method in class org.apache.spark.graphx.impl.EdgeRDDImpl
+
+$plus$plus(RDDT)
 - Static method in class org.apache.spark.graphx.impl.VertexRDDImpl
+
+$plus$plus(RDDT)
 - Static method in class org.apache.spark.graphx.VertexRDD
+
+$plus$plus(RDDT)
 - Static method in class org.apache.spark.rdd.HadoopRDD
+
+$plus$plus(RDDT)
 - Static method in class org.apache.spark.rdd.JdbcRDD
+
+$plus$plus(RDDT)
 - Static method in class org.apache.spark.rdd.NewHadoopRDD
+
+$plus$plus(RDDT)
 - Static method in class org.apache.spark.rdd.PartitionPruningRDD
+
+$plus$plus(RDDT)
 - Static method in class org.apache.spark.rdd.UnionRDD
+
+$plus$plus(GenTraversableOnceB,
 CanBuildFromRepr, B, That) - Static method in class 
org.apache.spark.sql.types.StructType
+
+$plus$plus$colon(TraversableOnceB,
 CanBuildFromRepr, B, That) - Static method in class 
org.apache.spark.sql.types.StructType
+
+$plus$plus$colon(TraversableB,
 CanBuildFromRepr, B, That) - Static method in class 
org.apache.spark.sql.types.StructType
+
+$plus$plus$eq(R)
 - Static method in class org.apache.spark.Accumulator
+
+Deprecated.
+
+
+
+
+
+A
+
+abs(Column)
 - Static method in class org.apache.spark.sql.functions
+
+Computes the absolute value.
+
+abs() - 
Method in class org.apache.spark.sql.types.Decimal
+
+absent() - 
Static method in class org.apache.spark.api.java.Optional
+
+AbsoluteError - Class in org.apache.spark.mllib.tree.loss
+
+:: DeveloperApi ::
+ Class for absolute error loss calculation (for regression).
+
+AbsoluteError()
 - Constructor for class org.apache.spark.mllib.tree.loss.AbsoluteError
+
+accept(Parsers)
 - Static method in class org.apache.spark.ml.feature.RFormulaParser
+
+accept(ES,
 Function1ES, ListObject) - Static method in class 
org.apache.spark.ml.feature.RFormulaParser
+
+accept(String,
 PartialFunctionObject, U) - Static method in class 
org.apache.spark.ml.feature.RFormulaParser
+
+acceptIf(Function1Object,
 Object, Function1Object, String) - Static method in 
class org.apache.spark.ml.feature.RFormulaParser
+
+acceptMatch(String,
 PartialFunctionObject, U) - Static method in class 
org.apache.spark.ml.feature.RFormulaParser
+
+acceptSeq(ES,
 Function1ES, IterableObject) - Static method in 
class org.apache.spark.ml.feature.RFormulaParser
+
+accId() - Method 
in class org.apache.spark.CleanAccum
+
+AccumulableR,T - Class in org.apache.spark
+
+Deprecated.
+use AccumulatorV2. Since 2.0.0.
+
+
+Accumulable(R,
 AccumulableParamR, T) -

[41/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/factorial.html
--
diff --git a/site/docs/2.0.2/api/R/factorial.html 
b/site/docs/2.0.2/api/R/factorial.html
new file mode 100644
index 000..b72dc6b
--- /dev/null
+++ b/site/docs/2.0.2/api/R/factorial.html
@@ -0,0 +1,119 @@
+
+R: factorial
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+factorial 
{SparkR}R Documentation
+
+factorial
+
+Description
+
+Computes the factorial of the given value.
+
+
+
+Usage
+
+
+## S4 method for signature 'Column'
+factorial(x)
+
+
+
+Arguments
+
+
+x
+
+Column to compute on.
+
+
+
+
+Note
+
+factorial since 1.5.0
+
+
+
+See Also
+
+Other math_funcs: acos,
+acos,Column-method; asin,
+asin,Column-method; atan2,
+atan2,Column-method; atan,
+atan,Column-method; bin,
+bin, bin,Column-method;
+bround, bround,
+bround,Column-method; cbrt,
+cbrt, cbrt,Column-method;
+ceil, ceil,
+ceil,Column-method, ceiling,
+ceiling,Column-method; conv,
+conv,
+conv,Column,numeric,numeric-method;
+corr, corr,
+corr, corr,Column-method,
+corr,SparkDataFrame-method;
+cosh, cosh,Column-method;
+cos, cos,Column-method;
+covar_pop, covar_pop,
+covar_pop,characterOrColumn,characterOrColumn-method;
+cov, cov, cov,
+cov,SparkDataFrame-method,
+cov,characterOrColumn-method,
+covar_samp, covar_samp,
+covar_samp,characterOrColumn,characterOrColumn-method;
+expm1, expm1,Column-method;
+exp, exp,Column-method;
+floor, floor,Column-method;
+hex, hex,
+hex,Column-method; hypot,
+hypot, hypot,Column-method;
+log10, log10,Column-method;
+log1p, log1p,Column-method;
+log2, log2,Column-method;
+log, log,Column-method;
+pmod, pmod,
+pmod,Column-method; rint,
+rint, rint,Column-method;
+round, round,Column-method;
+shiftLeft, shiftLeft,
+shiftLeft,Column,numeric-method;
+shiftRightUnsigned,
+shiftRightUnsigned,
+shiftRightUnsigned,Column,numeric-method;
+shiftRight, shiftRight,
+shiftRight,Column,numeric-method;
+sign, sign,Column-method,
+signum, signum,
+signum,Column-method; sinh,
+sinh,Column-method; sin,
+sin,Column-method; sqrt,
+sqrt,Column-method; tanh,
+tanh,Column-method; tan,
+tan,Column-method; toDegrees,
+toDegrees,
+toDegrees,Column-method;
+toRadians, toRadians,
+toRadians,Column-method;
+unhex, unhex,
+unhex,Column-method
+
+
+
+Examples
+
+## Not run: factorial(df$c)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/filter.html
--
diff --git a/site/docs/2.0.2/api/R/filter.html 
b/site/docs/2.0.2/api/R/filter.html
new file mode 100644
index 000..eed100d
--- /dev/null
+++ b/site/docs/2.0.2/api/R/filter.html
@@ -0,0 +1,288 @@
+
+R: Filter
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+filter 
{SparkR}R Documentation
+
+Filter
+
+Description
+
+Filter the rows of a SparkDataFrame according to a given condition.
+
+
+
+Usage
+
+
+## S4 method for signature 'SparkDataFrame,characterOrColumn'
+filter(x, condition)
+
+## S4 method for signature 'SparkDataFrame,characterOrColumn'
+where(x, condition)
+
+filter(x, condition)
+
+where(x, condition)
+
+
+
+Arguments
+
+
+x
+
+A SparkDataFrame to be sorted.
+
+condition
+
+The condition to filter on. This may either be a Column expression
+or a string containing a SQL statement
+
+
+
+
+Value
+
+A SparkDataFrame containing only the rows that meet the condition.
+
+
+
+Note
+
+filter since 1.4.0
+
+where since 1.4.0
+
+
+
+See Also
+
+Other SparkDataFrame functions: $,
+$,SparkDataFrame-method, $-,
+$-,SparkDataFrame-method,
+select, select,
+select,SparkDataFrame,Column-method,
+select,SparkDataFrame,character-method,
+select,SparkDataFrame,list-method;
+SparkDataFrame-class; [,
+[,SparkDataFrame-method, [[,
+[[,SparkDataFrame,numericOrcharacter-method,
+subset, subset,
+subset,SparkDataFrame-method;
+agg, agg, agg,
+agg,GroupedData-method,
+agg,SparkDataFrame-method,
+summarize, summarize,
+summarize,
+summarize,GroupedData-method,
+summarize,SparkDataFrame-method;
+arrange, arrange,
+arrange,
+arrange,SparkDataFrame,Column-method,
+arrange,SparkDataFrame,character-method,
+orderBy,SparkDataFrame,characterOrColumn-method;
+as.data.frame,
+as.data.frame,SparkDataFrame-method;
+attach,
+attach,SparkDataFrame-method;
+cache, cache,
+cache,SparkDataFrame-method;
+collect, collect,
+collect,SparkDataFrame-method;
+colnames, colnames,
+colnames,SparkDataFrame-method,

[29/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/show.html
--
diff --git a/site/docs/2.0.2/api/R/show.html b/site/docs/2.0.2/api/R/show.html
new file mode 100644
index 000..e6d3735
--- /dev/null
+++ b/site/docs/2.0.2/api/R/show.html
@@ -0,0 +1,269 @@
+
+R: show
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+show 
{SparkR}R Documentation
+
+show
+
+Description
+
+Print class and type information of a Spark object.
+
+
+
+Usage
+
+
+## S4 method for signature 'SparkDataFrame'
+show(object)
+
+## S4 method for signature 'WindowSpec'
+show(object)
+
+## S4 method for signature 'Column'
+show(object)
+
+## S4 method for signature 'GroupedData'
+show(object)
+
+
+
+Arguments
+
+
+object
+
+a Spark object. Can be a SparkDataFrame, Column, GroupedData, 
WindowSpec.
+
+
+
+
+Note
+
+show(SparkDataFrame) since 1.4.0
+
+show(WindowSpec) since 2.0.0
+
+show(Column) since 1.4.0
+
+show(GroupedData) since 1.4.0
+
+
+
+See Also
+
+Other SparkDataFrame functions: $,
+$,SparkDataFrame-method, $-,
+$-,SparkDataFrame-method,
+select, select,
+select,SparkDataFrame,Column-method,
+select,SparkDataFrame,character-method,
+select,SparkDataFrame,list-method;
+SparkDataFrame-class; [,
+[,SparkDataFrame-method, [[,
+[[,SparkDataFrame,numericOrcharacter-method,
+subset, subset,
+subset,SparkDataFrame-method;
+agg, agg, agg,
+agg,GroupedData-method,
+agg,SparkDataFrame-method,
+summarize, summarize,
+summarize,
+summarize,GroupedData-method,
+summarize,SparkDataFrame-method;
+arrange, arrange,
+arrange,
+arrange,SparkDataFrame,Column-method,
+arrange,SparkDataFrame,character-method,
+orderBy,SparkDataFrame,characterOrColumn-method;
+as.data.frame,
+as.data.frame,SparkDataFrame-method;
+attach,
+attach,SparkDataFrame-method;
+cache, cache,
+cache,SparkDataFrame-method;
+collect, collect,
+collect,SparkDataFrame-method;
+colnames, colnames,
+colnames,SparkDataFrame-method,
+colnames-, colnames-,
+colnames-,SparkDataFrame-method,
+columns, columns,
+columns,SparkDataFrame-method,
+names,
+names,SparkDataFrame-method,
+names-,
+names-,SparkDataFrame-method;
+coltypes, coltypes,
+coltypes,SparkDataFrame-method,
+coltypes-, coltypes-,
+coltypes-,SparkDataFrame,character-method;
+count,SparkDataFrame-method,
+nrow, nrow,
+nrow,SparkDataFrame-method;
+createOrReplaceTempView,
+createOrReplaceTempView,
+createOrReplaceTempView,SparkDataFrame,character-method;
+dapplyCollect, dapplyCollect,
+dapplyCollect,SparkDataFrame,function-method;
+dapply, dapply,
+dapply,SparkDataFrame,function,structType-method;
+describe, describe,
+describe,
+describe,SparkDataFrame,ANY-method,
+describe,SparkDataFrame,character-method,
+describe,SparkDataFrame-method,
+summary, summary,
+summary,SparkDataFrame-method;
+dim,
+dim,SparkDataFrame-method;
+distinct, distinct,
+distinct,SparkDataFrame-method,
+unique,
+unique,SparkDataFrame-method;
+dropDuplicates,
+dropDuplicates,
+dropDuplicates,SparkDataFrame-method;
+dropna, dropna,
+dropna,SparkDataFrame-method,
+fillna, fillna,
+fillna,SparkDataFrame-method,
+na.omit, na.omit,
+na.omit,SparkDataFrame-method;
+drop, drop,
+drop, drop,ANY-method,
+drop,SparkDataFrame-method;
+dtypes, dtypes,
+dtypes,SparkDataFrame-method;
+except, except,
+except,SparkDataFrame,SparkDataFrame-method;
+explain, explain,
+explain,SparkDataFrame-method;
+filter, filter,
+filter,SparkDataFrame,characterOrColumn-method,
+where, where,
+where,SparkDataFrame,characterOrColumn-method;
+first, first,
+first,
+first,SparkDataFrame-method,
+first,characterOrColumn-method;
+gapplyCollect, gapplyCollect,
+gapplyCollect,
+gapplyCollect,GroupedData-method,
+gapplyCollect,SparkDataFrame-method;
+gapply, gapply,
+gapply,
+gapply,GroupedData-method,
+gapply,SparkDataFrame-method;
+groupBy, groupBy,
+groupBy,SparkDataFrame-method,
+group_by, group_by,
+group_by,SparkDataFrame-method;
+head,
+head,SparkDataFrame-method;
+histogram,
+histogram,SparkDataFrame,characterOrColumn-method;
+insertInto, insertInto,
+insertInto,SparkDataFrame,character-method;
+intersect, intersect,
+intersect,SparkDataFrame,SparkDataFrame-method;
+isLocal, isLocal,
+isLocal,SparkDataFrame-method;
+join,
+join,SparkDataFrame,SparkDataFrame-method;
+limit, limit,
+limit,SparkDataFrame,numeric-method;
+merge, merge,
+merge,SparkDataFrame,SparkDataFrame-method;
+mutate, mutate,
+mutate,SparkDataFrame-method,
+transform, transform,
+transform,SparkDataFrame-method;
+ncol,
+ncol,SparkDataFrame-method;
+persist, persist,
+persist,SparkDataFrame,character-method;
+printSchema, printSchema,
+printSchema,SparkDataFrame-method;
+randomSplit, randomSplit,
+randomSplit,SparkDataFrame,numeric-method;
+rbind, rbind,

[36/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/lower.html
--
diff --git a/site/docs/2.0.2/api/R/lower.html b/site/docs/2.0.2/api/R/lower.html
new file mode 100644
index 000..8122f5f
--- /dev/null
+++ b/site/docs/2.0.2/api/R/lower.html
@@ -0,0 +1,114 @@
+
+R: lower
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+lower 
{SparkR}R Documentation
+
+lower
+
+Description
+
+Converts a string column to lower case.
+
+
+
+Usage
+
+
+## S4 method for signature 'Column'
+lower(x)
+
+lower(x)
+
+
+
+Arguments
+
+
+x
+
+Column to compute on.
+
+
+
+
+Note
+
+lower since 1.4.0
+
+
+
+See Also
+
+Other string_funcs: ascii,
+ascii, ascii,Column-method;
+base64, base64,
+base64,Column-method;
+concat_ws, concat_ws,
+concat_ws,character,Column-method;
+concat, concat,
+concat,Column-method; decode,
+decode,
+decode,Column,character-method;
+encode, encode,
+encode,Column,character-method;
+format_number, format_number,
+format_number,Column,numeric-method;
+format_string, format_string,
+format_string,character,Column-method;
+initcap, initcap,
+initcap,Column-method; instr,
+instr,
+instr,Column,character-method;
+length, length,Column-method;
+levenshtein, levenshtein,
+levenshtein,Column-method;
+locate, locate,
+locate,character,Column-method;
+lpad, lpad,
+lpad,Column,numeric,character-method;
+ltrim, ltrim,
+ltrim,Column-method;
+regexp_extract,
+regexp_extract,
+regexp_extract,Column,character,numeric-method;
+regexp_replace,
+regexp_replace,
+regexp_replace,Column,character,character-method;
+reverse, reverse,
+reverse,Column-method; rpad,
+rpad,
+rpad,Column,numeric,character-method;
+rtrim, rtrim,
+rtrim,Column-method; soundex,
+soundex,
+soundex,Column-method;
+substring_index,
+substring_index,
+substring_index,Column,character,numeric-method;
+translate, translate,
+translate,Column,character,character-method;
+trim, trim,
+trim,Column-method; unbase64,
+unbase64,
+unbase64,Column-method;
+upper, upper,
+upper,Column-method
+
+
+
+Examples
+
+## Not run: lower(df$c)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/lpad.html
--
diff --git a/site/docs/2.0.2/api/R/lpad.html b/site/docs/2.0.2/api/R/lpad.html
new file mode 100644
index 000..ea0f899
--- /dev/null
+++ b/site/docs/2.0.2/api/R/lpad.html
@@ -0,0 +1,121 @@
+
+R: lpad
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+lpad 
{SparkR}R Documentation
+
+lpad
+
+Description
+
+Left-pad the string column with
+
+
+
+Usage
+
+
+## S4 method for signature 'Column,numeric,character'
+lpad(x, len, pad)
+
+lpad(x, len, pad)
+
+
+
+Arguments
+
+
+x
+
+the string Column to be left-padded.
+
+len
+
+maximum length of each output result.
+
+pad
+
+a character string to be padded with.
+
+
+
+
+Note
+
+lpad since 1.5.0
+
+
+
+See Also
+
+Other string_funcs: ascii,
+ascii, ascii,Column-method;
+base64, base64,
+base64,Column-method;
+concat_ws, concat_ws,
+concat_ws,character,Column-method;
+concat, concat,
+concat,Column-method; decode,
+decode,
+decode,Column,character-method;
+encode, encode,
+encode,Column,character-method;
+format_number, format_number,
+format_number,Column,numeric-method;
+format_string, format_string,
+format_string,character,Column-method;
+initcap, initcap,
+initcap,Column-method; instr,
+instr,
+instr,Column,character-method;
+length, length,Column-method;
+levenshtein, levenshtein,
+levenshtein,Column-method;
+locate, locate,
+locate,character,Column-method;
+lower, lower,
+lower,Column-method; ltrim,
+ltrim, ltrim,Column-method;
+regexp_extract,
+regexp_extract,
+regexp_extract,Column,character,numeric-method;
+regexp_replace,
+regexp_replace,
+regexp_replace,Column,character,character-method;
+reverse, reverse,
+reverse,Column-method; rpad,
+rpad,
+rpad,Column,numeric,character-method;
+rtrim, rtrim,
+rtrim,Column-method; soundex,
+soundex,
+soundex,Column-method;
+substring_index,
+substring_index,
+substring_index,Column,character,numeric-method;
+translate, translate,
+translate,Column,character,character-method;
+trim, trim,
+trim,Column-method; unbase64,
+unbase64,
+unbase64,Column-method;
+upper, upper,
+upper,Column-method
+
+
+
+Examples
+
+## Not run: lpad(df$c, 6, #)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

[16/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/lib/jquery.js
--
diff --git a/site/docs/2.0.2/api/java/lib/jquery.js 
b/site/docs/2.0.2/api/java/lib/jquery.js
new file mode 100644
index 000..bc3fbc8
--- /dev/null
+++ b/site/docs/2.0.2/api/java/lib/jquery.js
@@ -0,0 +1,2 @@
+/*! jQuery v1.8.2 jquery.com | jquery.org/license */
+(function(a,b){function G(a){var b=F[a]={};return 
p.each(a.split(s),function(a,c){b[c]=!0}),b}function 
J(a,c,d){if(d===b&===1){var 
e="data-"+c.replace(I,"-$1").toLowerCase();d=a.getAttribute(e);if(typeof 
d=="string"){try{d=d==="true"?!0:d==="false"?!1:d==="null"?null:+d+""===d?+d:H.test(d)?p.parseJSON(d):d}catch(f){}p.data(a,c,d)}else
 d=b}return d}function K(a){var b;for(b in 
a){if(b==="data"&(a[b]))continue;if(b!=="toJSON")return!1}return!0}function
 ba(){return!1}function bb(){return!0}function 
bh(a){return!a||!a.parentNode||a.parentNode.nodeType===11}function bi(a,b){do 
a=a[b];while(a&!==1);return a}function 
bj(a,b,c){b=b||0;if(p.isFunction(b))return p.grep(a,function(a,d){var 
e=!!b.call(a,d,a);return e===c});if(b.nodeType)return 
p.grep(a,function(a,d){return a===b===c});if(typeof b=="string"){var 
d=p.grep(a,function(a){return a.nodeType===1});if(be.test(b))return 
p.filter(b,d,!c);b=p.filter(b,d)}return p.grep(a,function(a,d){return p.inArray(
 a,b)>=0===c})}function bk(a){var 
b=bl.split("|"),c=a.createDocumentFragment();if(c.createElement)while(b.length)c.createElement(b.pop());return
 c}function bC(a,b){return 
a.getElementsByTagName(b)[0]||a.appendChild(a.ownerDocument.createElement(b))}function
 bD(a,b){if(b.nodeType!==1||!p.hasData(a))return;var 
c,d,e,f=p._data(a),g=p._data(b,f),h=f.events;if(h){delete 
g.handle,g.events={};for(c in 
h)for(d=0,e=h[c].length;d").appendTo(e.body),c=b.css("display");b.remove();if(c==="none"||c===""){bI=e.body.appendChild(bI||p.extend(e.createElement("iframe"),{frameBorder:0,width:0,height:0}));if(!bJ||!bI.
 
createElement)bJ=(bI.contentWindow||bI.contentDocument).document,bJ.write(""),bJ.close();b=bJ.body.appendChild(bJ.createElement(a)),c=bH(b,"display"),e.body.removeChild(bI)}return
 bS[a]=c,c}function ci(a,b,c,d){var 
e;if(p.isArray(b))p.each(b,function(b,e){c||ce.test(a)?d(a,e):ci(a+"["+(typeof 
e=="object"?b:"")+"]",e,c,d)});else if(!c&(b)==="object")for(e in 
b)ci(a+"["+e+"]",b[e],c,d);else d(a,b)}function cz(a){return 
function(b,c){typeof b!="string"&&(c=b,b="*");var

[10/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/InternalAccumulator.html
--
diff --git a/site/docs/2.0.2/api/java/org/apache/spark/InternalAccumulator.html 
b/site/docs/2.0.2/api/java/org/apache/spark/InternalAccumulator.html
new file mode 100644
index 000..d6fed2d
--- /dev/null
+++ b/site/docs/2.0.2/api/java/org/apache/spark/InternalAccumulator.html
@@ -0,0 +1,469 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+InternalAccumulator (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next 
Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+
+org.apache.spark
+Class 
InternalAccumulator
+
+
+
+Object
+
+
+org.apache.spark.InternalAccumulator
+
+
+
+
+
+
+
+
+public class InternalAccumulator
+extends Object
+A collection of fields and methods concerned with internal 
accumulators that represent
+ task level metrics.
+
+
+
+
+
+
+
+
+
+
+
+Nested Class Summary
+
+Nested Classes
+
+Modifier and Type
+Class and Description
+
+
+static class
+InternalAccumulator.input$
+
+
+static class
+InternalAccumulator.output$
+
+
+static class
+InternalAccumulator.shuffleRead$
+
+
+static class
+InternalAccumulator.shuffleWrite$
+
+
+
+
+
+
+
+
+
+Constructor Summary
+
+Constructors
+
+Constructor and Description
+
+
+InternalAccumulator()
+
+
+
+
+
+
+
+
+
+Method Summary
+
+Methods
+
+Modifier and Type
+Method and Description
+
+
+static String
+DISK_BYTES_SPILLED()
+
+
+static String
+EXECUTOR_DESERIALIZE_TIME()
+
+
+static String
+EXECUTOR_RUN_TIME()
+
+
+static String
+INPUT_METRICS_PREFIX()
+
+
+static String
+JVM_GC_TIME()
+
+
+static String
+MEMORY_BYTES_SPILLED()
+
+
+static String
+METRICS_PREFIX()
+
+
+static String
+OUTPUT_METRICS_PREFIX()
+
+
+static String
+PEAK_EXECUTION_MEMORY()
+
+
+static String
+RESULT_SERIALIZATION_TIME()
+
+
+static String
+RESULT_SIZE()
+
+
+static String
+SHUFFLE_READ_METRICS_PREFIX()
+
+
+static String
+SHUFFLE_WRITE_METRICS_PREFIX()
+
+
+static String
+TEST_ACCUM()
+
+
+static String
+UPDATED_BLOCK_STATUSES()
+
+
+
+
+
+
+Methods inherited from classObject
+equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, 
wait
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Constructor Detail
+
+
+
+
+
+InternalAccumulator
+publicInternalAccumulator()
+
+
+
+
+
+
+
+
+
+Method Detail
+
+
+
+
+
+METRICS_PREFIX
+public staticStringMETRICS_PREFIX()
+
+
+
+
+
+
+
+SHUFFLE_READ_METRICS_PREFIX
+public staticStringSHUFFLE_READ_METRICS_PREFIX()
+
+
+
+
+
+
+
+SHUFFLE_WRITE_METRICS_PREFIX
+public staticStringSHUFFLE_WRITE_METRICS_PREFIX()
+
+
+
+
+
+
+
+OUTPUT_METRICS_PREFIX
+public staticStringOUTPUT_METRICS_PREFIX()
+
+
+
+
+
+
+
+INPUT_METRICS_PREFIX
+public staticStringINPUT_METRICS_PREFIX()
+
+
+
+
+
+
+
+EXECUTOR_DESERIALIZE_TIME
+public staticStringEXECUTOR_DESERIALIZE_TIME()
+
+
+
+
+
+
+
+EXECUTOR_RUN_TIME
+public staticStringEXECUTOR_RUN_TIME()
+
+
+
+
+
+
+
+RESULT_SIZE
+public staticStringRESULT_SIZE()
+
+
+
+
+
+
+
+JVM_GC_TIME
+public staticStringJVM_GC_TIME()
+
+
+
+
+
+
+
+RESULT_SERIALIZATION_TIME
+public staticStringRESULT_SERIALIZATION_TIME()
+
+
+
+
+
+
+
+MEMORY_BYTES_SPILLED
+public staticStringMEMORY_BYTES_SPILLED()
+
+
+
+
+
+
+
+DISK_BYTES_SPILLED
+public staticStringDISK_BYTES_SPILLED()
+
+
+
+
+
+
+
+PEAK_EXECUTION_MEMORY
+public staticStringPEAK_EXECUTION_MEMORY()
+
+
+
+
+
+
+
+UPDATED_BLOCK_STATUSES
+public staticStringUPDATED_BLOCK_STATUSES()
+
+
+
+
+
+
+
+TEST_ACCUM
+public staticStringTEST_ACCUM()
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next 
Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/InternalAccumulator.input$.html
--
diff --git 
a/site/docs/2.0.2/api/java/org/apache/spark/InternalAccumulator.input$.html 
b/site/docs/2.0.2/api/java/org/apache/spark/InternalAccumulator.input$.html
new file

[39/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/head.html
--
diff --git a/site/docs/2.0.2/api/R/head.html b/site/docs/2.0.2/api/R/head.html
new file mode 100644
index 000..ec3431b
--- /dev/null
+++ b/site/docs/2.0.2/api/R/head.html
@@ -0,0 +1,267 @@
+
+R: Head
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+head 
{SparkR}R Documentation
+
+Head
+
+Description
+
+Return the first num rows of a SparkDataFrame as a R 
data.frame. If num is not
+specified, then head() returns the first 6 rows as with R data.frame.
+
+
+
+Usage
+
+
+## S4 method for signature 'SparkDataFrame'
+head(x, num = 6L)
+
+
+
+Arguments
+
+
+x
+
+a SparkDataFrame.
+
+num
+
+the number of rows to return. Default is 6.
+
+
+
+
+Value
+
+A data.frame.
+
+
+
+Note
+
+head since 1.4.0
+
+
+
+See Also
+
+Other SparkDataFrame functions: $,
+$,SparkDataFrame-method, $-,
+$-,SparkDataFrame-method,
+select, select,
+select,SparkDataFrame,Column-method,
+select,SparkDataFrame,character-method,
+select,SparkDataFrame,list-method;
+SparkDataFrame-class; [,
+[,SparkDataFrame-method, [[,
+[[,SparkDataFrame,numericOrcharacter-method,
+subset, subset,
+subset,SparkDataFrame-method;
+agg, agg, agg,
+agg,GroupedData-method,
+agg,SparkDataFrame-method,
+summarize, summarize,
+summarize,
+summarize,GroupedData-method,
+summarize,SparkDataFrame-method;
+arrange, arrange,
+arrange,
+arrange,SparkDataFrame,Column-method,
+arrange,SparkDataFrame,character-method,
+orderBy,SparkDataFrame,characterOrColumn-method;
+as.data.frame,
+as.data.frame,SparkDataFrame-method;
+attach,
+attach,SparkDataFrame-method;
+cache, cache,
+cache,SparkDataFrame-method;
+collect, collect,
+collect,SparkDataFrame-method;
+colnames, colnames,
+colnames,SparkDataFrame-method,
+colnames-, colnames-,
+colnames-,SparkDataFrame-method,
+columns, columns,
+columns,SparkDataFrame-method,
+names,
+names,SparkDataFrame-method,
+names-,
+names-,SparkDataFrame-method;
+coltypes, coltypes,
+coltypes,SparkDataFrame-method,
+coltypes-, coltypes-,
+coltypes-,SparkDataFrame,character-method;
+count,SparkDataFrame-method,
+nrow, nrow,
+nrow,SparkDataFrame-method;
+createOrReplaceTempView,
+createOrReplaceTempView,
+createOrReplaceTempView,SparkDataFrame,character-method;
+dapplyCollect, dapplyCollect,
+dapplyCollect,SparkDataFrame,function-method;
+dapply, dapply,
+dapply,SparkDataFrame,function,structType-method;
+describe, describe,
+describe,
+describe,SparkDataFrame,ANY-method,
+describe,SparkDataFrame,character-method,
+describe,SparkDataFrame-method,
+summary, summary,
+summary,SparkDataFrame-method;
+dim,
+dim,SparkDataFrame-method;
+distinct, distinct,
+distinct,SparkDataFrame-method,
+unique,
+unique,SparkDataFrame-method;
+dropDuplicates,
+dropDuplicates,
+dropDuplicates,SparkDataFrame-method;
+dropna, dropna,
+dropna,SparkDataFrame-method,
+fillna, fillna,
+fillna,SparkDataFrame-method,
+na.omit, na.omit,
+na.omit,SparkDataFrame-method;
+drop, drop,
+drop, drop,ANY-method,
+drop,SparkDataFrame-method;
+dtypes, dtypes,
+dtypes,SparkDataFrame-method;
+except, except,
+except,SparkDataFrame,SparkDataFrame-method;
+explain, explain,
+explain,SparkDataFrame-method;
+filter, filter,
+filter,SparkDataFrame,characterOrColumn-method,
+where, where,
+where,SparkDataFrame,characterOrColumn-method;
+first, first,
+first,
+first,SparkDataFrame-method,
+first,characterOrColumn-method;
+gapplyCollect, gapplyCollect,
+gapplyCollect,
+gapplyCollect,GroupedData-method,
+gapplyCollect,SparkDataFrame-method;
+gapply, gapply,
+gapply,
+gapply,GroupedData-method,
+gapply,SparkDataFrame-method;
+groupBy, groupBy,
+groupBy,SparkDataFrame-method,
+group_by, group_by,
+group_by,SparkDataFrame-method;
+histogram,
+histogram,SparkDataFrame,characterOrColumn-method;
+insertInto, insertInto,
+insertInto,SparkDataFrame,character-method;
+intersect, intersect,
+intersect,SparkDataFrame,SparkDataFrame-method;
+isLocal, isLocal,
+isLocal,SparkDataFrame-method;
+join,
+join,SparkDataFrame,SparkDataFrame-method;
+limit, limit,
+limit,SparkDataFrame,numeric-method;
+merge, merge,
+merge,SparkDataFrame,SparkDataFrame-method;
+mutate, mutate,
+mutate,SparkDataFrame-method,
+transform, transform,
+transform,SparkDataFrame-method;
+ncol,
+ncol,SparkDataFrame-method;
+persist, persist,
+persist,SparkDataFrame,character-method;
+printSchema, printSchema,
+printSchema,SparkDataFrame-method;
+randomSplit, randomSplit,
+randomSplit,SparkDataFrame,numeric-method;
+rbind, rbind,
+rbind,SparkDataFrame-method;
+registerTempTable,
+registerTempTable,
+registerTempTable,SparkDataFrame,character-method;
+rename, rename,
+rename,SparkDataFrame-method,
+withColumnRenamed,

[33/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/printSchema.html
--
diff --git a/site/docs/2.0.2/api/R/printSchema.html 
b/site/docs/2.0.2/api/R/printSchema.html
new file mode 100644
index 000..b8846b6
--- /dev/null
+++ b/site/docs/2.0.2/api/R/printSchema.html
@@ -0,0 +1,258 @@
+
+R: Print Schema of a SparkDataFrame
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+printSchema {SparkR}R 
Documentation
+
+Print Schema of a SparkDataFrame
+
+Description
+
+Prints out the schema in tree format
+
+
+
+Usage
+
+
+## S4 method for signature 'SparkDataFrame'
+printSchema(x)
+
+printSchema(x)
+
+
+
+Arguments
+
+
+x
+
+A SparkDataFrame
+
+
+
+
+Note
+
+printSchema since 1.4.0
+
+
+
+See Also
+
+Other SparkDataFrame functions: $,
+$,SparkDataFrame-method, $-,
+$-,SparkDataFrame-method,
+select, select,
+select,SparkDataFrame,Column-method,
+select,SparkDataFrame,character-method,
+select,SparkDataFrame,list-method;
+SparkDataFrame-class; [,
+[,SparkDataFrame-method, [[,
+[[,SparkDataFrame,numericOrcharacter-method,
+subset, subset,
+subset,SparkDataFrame-method;
+agg, agg, agg,
+agg,GroupedData-method,
+agg,SparkDataFrame-method,
+summarize, summarize,
+summarize,
+summarize,GroupedData-method,
+summarize,SparkDataFrame-method;
+arrange, arrange,
+arrange,
+arrange,SparkDataFrame,Column-method,
+arrange,SparkDataFrame,character-method,
+orderBy,SparkDataFrame,characterOrColumn-method;
+as.data.frame,
+as.data.frame,SparkDataFrame-method;
+attach,
+attach,SparkDataFrame-method;
+cache, cache,
+cache,SparkDataFrame-method;
+collect, collect,
+collect,SparkDataFrame-method;
+colnames, colnames,
+colnames,SparkDataFrame-method,
+colnames-, colnames-,
+colnames-,SparkDataFrame-method,
+columns, columns,
+columns,SparkDataFrame-method,
+names,
+names,SparkDataFrame-method,
+names-,
+names-,SparkDataFrame-method;
+coltypes, coltypes,
+coltypes,SparkDataFrame-method,
+coltypes-, coltypes-,
+coltypes-,SparkDataFrame,character-method;
+count,SparkDataFrame-method,
+nrow, nrow,
+nrow,SparkDataFrame-method;
+createOrReplaceTempView,
+createOrReplaceTempView,
+createOrReplaceTempView,SparkDataFrame,character-method;
+dapplyCollect, dapplyCollect,
+dapplyCollect,SparkDataFrame,function-method;
+dapply, dapply,
+dapply,SparkDataFrame,function,structType-method;
+describe, describe,
+describe,
+describe,SparkDataFrame,ANY-method,
+describe,SparkDataFrame,character-method,
+describe,SparkDataFrame-method,
+summary, summary,
+summary,SparkDataFrame-method;
+dim,
+dim,SparkDataFrame-method;
+distinct, distinct,
+distinct,SparkDataFrame-method,
+unique,
+unique,SparkDataFrame-method;
+dropDuplicates,
+dropDuplicates,
+dropDuplicates,SparkDataFrame-method;
+dropna, dropna,
+dropna,SparkDataFrame-method,
+fillna, fillna,
+fillna,SparkDataFrame-method,
+na.omit, na.omit,
+na.omit,SparkDataFrame-method;
+drop, drop,
+drop, drop,ANY-method,
+drop,SparkDataFrame-method;
+dtypes, dtypes,
+dtypes,SparkDataFrame-method;
+except, except,
+except,SparkDataFrame,SparkDataFrame-method;
+explain, explain,
+explain,SparkDataFrame-method;
+filter, filter,
+filter,SparkDataFrame,characterOrColumn-method,
+where, where,
+where,SparkDataFrame,characterOrColumn-method;
+first, first,
+first,
+first,SparkDataFrame-method,
+first,characterOrColumn-method;
+gapplyCollect, gapplyCollect,
+gapplyCollect,
+gapplyCollect,GroupedData-method,
+gapplyCollect,SparkDataFrame-method;
+gapply, gapply,
+gapply,
+gapply,GroupedData-method,
+gapply,SparkDataFrame-method;
+groupBy, groupBy,
+groupBy,SparkDataFrame-method,
+group_by, group_by,
+group_by,SparkDataFrame-method;
+head,
+head,SparkDataFrame-method;
+histogram,
+histogram,SparkDataFrame,characterOrColumn-method;
+insertInto, insertInto,
+insertInto,SparkDataFrame,character-method;
+intersect, intersect,
+intersect,SparkDataFrame,SparkDataFrame-method;
+isLocal, isLocal,
+isLocal,SparkDataFrame-method;
+join,
+join,SparkDataFrame,SparkDataFrame-method;
+limit, limit,
+limit,SparkDataFrame,numeric-method;
+merge, merge,
+merge,SparkDataFrame,SparkDataFrame-method;
+mutate, mutate,
+mutate,SparkDataFrame-method,
+transform, transform,
+transform,SparkDataFrame-method;
+ncol,
+ncol,SparkDataFrame-method;
+persist, persist,
+persist,SparkDataFrame,character-method;
+randomSplit, randomSplit,
+randomSplit,SparkDataFrame,numeric-method;
+rbind, rbind,
+rbind,SparkDataFrame-method;
+registerTempTable,
+registerTempTable,
+registerTempTable,SparkDataFrame,character-method;
+rename, rename,
+rename,SparkDataFrame-method,
+withColumnRenamed,
+withColumnRenamed,
+withColumnRenamed,SparkDataFrame,character,character-method;
+repartition, repartition,

[50/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/README.md
--
diff --git a/site/docs/2.0.2/README.md b/site/docs/2.0.2/README.md
new file mode 100644
index 000..ffd3b57
--- /dev/null
+++ b/site/docs/2.0.2/README.md
@@ -0,0 +1,72 @@
+Welcome to the Spark documentation!
+
+This readme will walk you through navigating and building the Spark 
documentation, which is included
+here with the Spark source code. You can also find documentation specific to 
release versions of
+Spark at http://spark.apache.org/documentation.html.
+
+Read on to learn more about viewing documentation in plain text (i.e., 
markdown) or building the
+documentation yourself. Why build it yourself? So that you have the docs that 
corresponds to
+whichever version of Spark you currently have checked out of revision control.
+
+## Prerequisites
+The Spark documentation build uses a number of tools to build HTML docs and 
API docs in Scala,
+Python and R.
+
+You need to have 
[Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
+[Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python)
+installed. Also install the following libraries:
+```sh
+$ sudo gem install jekyll jekyll-redirect-from pygments.rb
+$ sudo pip install Pygments
+# Following is needed only for generating API docs
+$ sudo pip install sphinx pypandoc
+$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "roxygen2", 
"testthat", "rmarkdown"), repos="http://cran.stat.ucla.edu/;)'
+```
+(Note: If you are on a system with both Ruby 1.9 and Ruby 2.0 you may need to 
replace gem with gem2.0)
+
+## Generating the Documentation HTML
+
+We include the Spark documentation as part of the source (as opposed to using 
a hosted wiki, such as
+the github wiki, as the definitive documentation) to enable the documentation 
to evolve along with
+the source code and be captured by revision control (currently git). This way 
the code automatically
+includes the version of the documentation that is relevant regardless of which 
version or release
+you have checked out or downloaded.
+
+In this directory you will find textfiles formatted using Markdown, with an 
".md" suffix. You can
+read those text files directly if you want. Start with index.md.
+
+Execute `jekyll build` from the `docs/` directory to compile the site. 
Compiling the site with
+Jekyll will create a directory called `_site` containing index.html as well as 
the rest of the
+compiled files.
+
+$ cd docs
+$ jekyll build
+
+You can modify the default Jekyll build as follows:
+```sh
+# Skip generating API docs (which takes a while)
+$ SKIP_API=1 jekyll build
+
+# Serve content locally on port 4000
+$ jekyll serve --watch
+
+# Build the site with extra features used on the live page
+$ PRODUCTION=1 jekyll build
+```
+
+## API Docs (Scaladoc, Sphinx, roxygen2)
+
+You can build just the Spark scaladoc by running `build/sbt unidoc` from the 
SPARK_PROJECT_ROOT directory.
+
+Similarly, you can build just the PySpark docs by running `make html` from the
+SPARK_PROJECT_ROOT/python/docs directory. Documentation is only generated for 
classes that are listed as
+public in `__init__.py`. The SparkR docs can be built by running 
SPARK_PROJECT_ROOT/R/create-docs.sh.
+
+When you run `jekyll` in the `docs` directory, it will also copy over the 
scaladoc for the various
+Spark subprojects into the `docs` directory (and then also into the `_site` 
directory). We use a
+jekyll plugin to run `build/sbt unidoc` before building the site so if you 
haven't run it (recently) it
+may take some time as it generates all of the scaladoc.  The jekyll plugin 
also generates the
+PySpark docs using [Sphinx](http://sphinx-doc.org/).
+
+NOTE: To skip the step of building and copying over the Scala, Python, R API 
docs, run `SKIP_API=1
+jekyll`.

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api.html
--
diff --git a/site/docs/2.0.2/api.html b/site/docs/2.0.2/api.html
new file mode 100644
index 000..731bd07
--- /dev/null
+++ b/site/docs/2.0.2/api.html
@@ -0,0 +1,178 @@
+
+
+
+
+
+  
+
+
+
+Spark API Documentation - Spark 2.0.2 Documentation
+
+
+
+
+
+
+body {
+padding-top: 60px;
+padding-bottom: 40px;
+}
+
+
+
+
+
+
+
+
+
+
+
+
+  var _gaq = _gaq || [];
+  _gaq.push(['_setAccount', 'UA-32518208-2']);
+  _gaq.push(['_trackPageview']);
+
+  (function() {
+var ga = document.createElement('script'); ga.type = 
'text/javascript'; ga.async = true;
+ga.src = ('https:' ==

[27/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/structField.html
--
diff --git a/site/docs/2.0.2/api/R/structField.html 
b/site/docs/2.0.2/api/R/structField.html
new file mode 100644
index 000..6325141
--- /dev/null
+++ b/site/docs/2.0.2/api/R/structField.html
@@ -0,0 +1,84 @@
+
+R: structField
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+structField {SparkR}R 
Documentation
+
+structField
+
+Description
+
+Create a structField object that contains the metadata for a single field 
in a schema.
+
+
+
+Usage
+
+
+structField(x, ...)
+
+## S3 method for class 'jobj'
+structField(x, ...)
+
+## S3 method for class 'character'
+structField(x, type, nullable = TRUE, ...)
+
+
+
+Arguments
+
+
+x
+
+the name of the field.
+
+...
+
+additional argument(s) passed to the method.
+
+type
+
+The data type of the field
+
+nullable
+
+A logical vector indicating whether or not the field is nullable
+
+
+
+
+Value
+
+A structField object.
+
+
+
+Note
+
+structField since 1.4.0
+
+
+
+Examples
+
+## Not run: 
+##D field1 - structField(a, integer)
+##D field2 - structField(c, string)
+##D field3 - structField(avg, double)
+##D schema -  structType(field1, field2, field3)
+##D df1 - gapply(df, list(a, c),
+##D   function(key, x) { y - data.frame(key, mean(x$b), 
stringsAsFactors = FALSE) },
+##D   schema)
+## End(Not run)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/structType.html
--
diff --git a/site/docs/2.0.2/api/R/structType.html 
b/site/docs/2.0.2/api/R/structType.html
new file mode 100644
index 000..d068b59
--- /dev/null
+++ b/site/docs/2.0.2/api/R/structType.html
@@ -0,0 +1,75 @@
+
+R: structType
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+structType 
{SparkR}R Documentation
+
+structType
+
+Description
+
+Create a structType object that contains the metadata for a SparkDataFrame. 
Intended for
+use with createDataFrame and toDF.
+
+
+
+Usage
+
+
+structType(x, ...)
+
+## S3 method for class 'jobj'
+structType(x, ...)
+
+## S3 method for class 'structField'
+structType(x, ...)
+
+
+
+Arguments
+
+
+x
+
+a structField object (created with the field() function)
+
+...
+
+additional structField objects
+
+
+
+
+Value
+
+a structType object
+
+
+
+Note
+
+structType since 1.4.0
+
+
+
+Examples
+
+## Not run: 
+##D schema -  structType(structField(a, integer), 
structField(c, string),
+##D   structField(avg, double))
+##D df1 - gapply(df, list(a, c),
+##D   function(key, x) { y - data.frame(key, mean(x$b), 
stringsAsFactors = FALSE) },
+##D   schema)
+## End(Not run)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/subset.html
--
diff --git a/site/docs/2.0.2/api/R/subset.html 
b/site/docs/2.0.2/api/R/subset.html
new file mode 100644
index 000..987e20b
--- /dev/null
+++ b/site/docs/2.0.2/api/R/subset.html
@@ -0,0 +1,309 @@
+
+R: Subset
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+[[ {SparkR}R Documentation
+
+Subset
+
+Description
+
+Return subsets of SparkDataFrame according to given conditions
+
+
+
+Usage
+
+
+## S4 method for signature 'SparkDataFrame,numericOrcharacter'
+x[[i]]
+
+## S4 method for signature 'SparkDataFrame'
+x[i, j, ..., drop = F]
+
+## S4 method for signature 'SparkDataFrame'
+subset(x, subset, select, drop = F, ...)
+
+subset(x, ...)
+
+
+
+Arguments
+
+
+x
+
+a SparkDataFrame.
+
+i,subset
+
+(Optional) a logical expression to filter on rows.
+
+j,select
+
+expression for the single Column or a list of columns to select from the 
SparkDataFrame.
+
+...
+
+currently not used.
+
+drop
+
+if TRUE, a Column will be returned if the resulting dataset has only one 
column.
+Otherwise, a SparkDataFrame will always be returned.
+
+
+
+
+Value
+
+A new SparkDataFrame containing only the rows that meet the condition with 
selected columns.
+
+
+
+Note
+
+[[ since 1.4.0
+
+[

[46/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/collect.html
--
diff --git a/site/docs/2.0.2/api/R/collect.html 
b/site/docs/2.0.2/api/R/collect.html
new file mode 100644
index 000..61090a7
--- /dev/null
+++ b/site/docs/2.0.2/api/R/collect.html
@@ -0,0 +1,268 @@
+
+R: Collects all the elements of a SparkDataFrame and 
coerces...
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+collect 
{SparkR}R Documentation
+
+Collects all the elements of a SparkDataFrame and coerces them into an R 
data.frame.
+
+Description
+
+Collects all the elements of a SparkDataFrame and coerces them into an R 
data.frame.
+
+
+
+Usage
+
+
+## S4 method for signature 'SparkDataFrame'
+collect(x, stringsAsFactors = FALSE)
+
+collect(x, ...)
+
+
+
+Arguments
+
+
+x
+
+a SparkDataFrame.
+
+stringsAsFactors
+
+(Optional) a logical indicating whether or not string columns
+should be converted to factors. FALSE by default.
+
+...
+
+further arguments to be passed to or from other methods.
+
+
+
+
+Note
+
+collect since 1.4.0
+
+
+
+See Also
+
+Other SparkDataFrame functions: $,
+$,SparkDataFrame-method, $-,
+$-,SparkDataFrame-method,
+select, select,
+select,SparkDataFrame,Column-method,
+select,SparkDataFrame,character-method,
+select,SparkDataFrame,list-method;
+SparkDataFrame-class; [,
+[,SparkDataFrame-method, [[,
+[[,SparkDataFrame,numericOrcharacter-method,
+subset, subset,
+subset,SparkDataFrame-method;
+agg, agg, agg,
+agg,GroupedData-method,
+agg,SparkDataFrame-method,
+summarize, summarize,
+summarize,
+summarize,GroupedData-method,
+summarize,SparkDataFrame-method;
+arrange, arrange,
+arrange,
+arrange,SparkDataFrame,Column-method,
+arrange,SparkDataFrame,character-method,
+orderBy,SparkDataFrame,characterOrColumn-method;
+as.data.frame,
+as.data.frame,SparkDataFrame-method;
+attach,
+attach,SparkDataFrame-method;
+cache, cache,
+cache,SparkDataFrame-method;
+colnames, colnames,
+colnames,SparkDataFrame-method,
+colnames-, colnames-,
+colnames-,SparkDataFrame-method,
+columns, columns,
+columns,SparkDataFrame-method,
+names,
+names,SparkDataFrame-method,
+names-,
+names-,SparkDataFrame-method;
+coltypes, coltypes,
+coltypes,SparkDataFrame-method,
+coltypes-, coltypes-,
+coltypes-,SparkDataFrame,character-method;
+count,SparkDataFrame-method,
+nrow, nrow,
+nrow,SparkDataFrame-method;
+createOrReplaceTempView,
+createOrReplaceTempView,
+createOrReplaceTempView,SparkDataFrame,character-method;
+dapplyCollect, dapplyCollect,
+dapplyCollect,SparkDataFrame,function-method;
+dapply, dapply,
+dapply,SparkDataFrame,function,structType-method;
+describe, describe,
+describe,
+describe,SparkDataFrame,ANY-method,
+describe,SparkDataFrame,character-method,
+describe,SparkDataFrame-method,
+summary, summary,
+summary,SparkDataFrame-method;
+dim,
+dim,SparkDataFrame-method;
+distinct, distinct,
+distinct,SparkDataFrame-method,
+unique,
+unique,SparkDataFrame-method;
+dropDuplicates,
+dropDuplicates,
+dropDuplicates,SparkDataFrame-method;
+dropna, dropna,
+dropna,SparkDataFrame-method,
+fillna, fillna,
+fillna,SparkDataFrame-method,
+na.omit, na.omit,
+na.omit,SparkDataFrame-method;
+drop, drop,
+drop, drop,ANY-method,
+drop,SparkDataFrame-method;
+dtypes, dtypes,
+dtypes,SparkDataFrame-method;
+except, except,
+except,SparkDataFrame,SparkDataFrame-method;
+explain, explain,
+explain,SparkDataFrame-method;
+filter, filter,
+filter,SparkDataFrame,characterOrColumn-method,
+where, where,
+where,SparkDataFrame,characterOrColumn-method;
+first, first,
+first,
+first,SparkDataFrame-method,
+first,characterOrColumn-method;
+gapplyCollect, gapplyCollect,
+gapplyCollect,
+gapplyCollect,GroupedData-method,
+gapplyCollect,SparkDataFrame-method;
+gapply, gapply,
+gapply,
+gapply,GroupedData-method,
+gapply,SparkDataFrame-method;
+groupBy, groupBy,
+groupBy,SparkDataFrame-method,
+group_by, group_by,
+group_by,SparkDataFrame-method;
+head,
+head,SparkDataFrame-method;
+histogram,
+histogram,SparkDataFrame,characterOrColumn-method;
+insertInto, insertInto,
+insertInto,SparkDataFrame,character-method;
+intersect, intersect,
+intersect,SparkDataFrame,SparkDataFrame-method;
+isLocal, isLocal,
+isLocal,SparkDataFrame-method;
+join,
+join,SparkDataFrame,SparkDataFrame-method;
+limit, limit,
+limit,SparkDataFrame,numeric-method;
+merge, merge,
+merge,SparkDataFrame,SparkDataFrame-method;
+mutate, mutate,
+mutate,SparkDataFrame-method,
+transform, transform,
+transform,SparkDataFrame-method;
+ncol,
+ncol,SparkDataFrame-method;
+persist, persist,
+persist,SparkDataFrame,character-method;
+printSchema, printSchema,
+printSchema,SparkDataFrame-method;
+randomSplit, randomSplit,

[20/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/allclasses-noframe.html
--
diff --git a/site/docs/2.0.2/api/java/allclasses-noframe.html 
b/site/docs/2.0.2/api/java/allclasses-noframe.html
new file mode 100644
index 000..05ef78b
--- /dev/null
+++ b/site/docs/2.0.2/api/java/allclasses-noframe.html
@@ -0,0 +1,1119 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+All Classes (Spark 2.0.2 JavaDoc)
+
+
+
+
+All Classes
+
+
+AbsoluteError
+Accumulable
+AccumulableInfo
+AccumulableInfo
+AccumulableParam
+Accumulator
+AccumulatorContext
+AccumulatorParam
+AccumulatorParam.DoubleAccumulatorParam$
+AccumulatorParam.FloatAccumulatorParam$
+AccumulatorParam.IntAccumulatorParam$
+AccumulatorParam.LongAccumulatorParam$
+AccumulatorParam.StringAccumulatorParam$
+AccumulatorV2
+AFTAggregator
+AFTCostFun
+AFTSurvivalRegression
+AFTSurvivalRegressionModel
+AggregatedDialect
+AggregatingEdgeContext
+Aggregator
+Aggregator
+Algo
+AllJobsCancelled
+AllReceiverIds
+ALS
+ALS
+ALS.InBlock$
+ALS.Rating
+ALS.Rating$
+ALS.RatingBlock$
+ALSModel
+AnalysisException
+And
+AnyDataType
+ApplicationAttemptInfo
+ApplicationInfo
+ApplicationsListResource
+ApplicationStatus
+ApplyInPlace
+AreaUnderCurve
+ArrayType
+AskPermissionToCommitOutput
+AssociationRules
+AssociationRules.Rule
+AsyncRDDActions
+Attribute
+AttributeGroup
+AttributeKeys
+AttributeType
+BaseRelation
+BaseRRDD
+BatchInfo
+BernoulliCellSampler
+BernoulliSampler
+Binarizer
+BinaryAttribute
+BinaryClassificationEvaluator
+BinaryClassificationMetrics
+BinaryLogisticRegressionSummary
+BinaryLogisticRegressionTrainingSummary
+BinarySample
+BinaryType
+BinomialBounds
+BisectingKMeans
+BisectingKMeans
+BisectingKMeansModel
+BisectingKMeansModel
+BisectingKMeansModel.SaveLoadV1_0$
+BLAS
+BLAS
+BlockId
+BlockManagerId
+BlockManagerMessages
+BlockManagerMessages.BlockManagerHeartbeat
+BlockManagerMessages.BlockManagerHeartbeat$
+BlockManagerMessages.GetBlockStatus
+BlockManagerMessages.GetBlockStatus$
+BlockManagerMessages.GetExecutorEndpointRef
+BlockManagerMessages.GetExecutorEndpointRef$
+BlockManagerMessages.GetLocations
+BlockManagerMessages.GetLocations$
+BlockManagerMessages.GetLocationsMultipleBlockIds
+BlockManagerMessages.GetLocationsMultipleBlockIds$
+BlockManagerMessages.GetMatchingBlockIds
+BlockManagerMessages.GetMatchingBlockIds$
+BlockManagerMessages.GetMemoryStatus$
+BlockManagerMessages.GetPeers
+BlockManagerMessages.GetPeers$
+BlockManagerMessages.GetStorageStatus$
+BlockManagerMessages.HasCachedBlocks
+BlockManagerMessages.HasCachedBlocks$
+BlockManagerMessages.RegisterBlockManager
+BlockManagerMessages.RegisterBlockManager$
+BlockManagerMessages.RemoveBlock
+BlockManagerMessages.RemoveBlock$
+BlockManagerMessages.RemoveBroadcast
+BlockManagerMessages.RemoveBroadcast$
+BlockManagerMessages.RemoveExecutor
+BlockManagerMessages.RemoveExecutor$
+BlockManagerMessages.RemoveRdd
+BlockManagerMessages.RemoveRdd$
+BlockManagerMessages.RemoveShuffle
+BlockManagerMessages.RemoveShuffle$
+BlockManagerMessages.StopBlockManagerMaster$
+BlockManagerMessages.ToBlockManagerMaster
+BlockManagerMessages.ToBlockManagerSlave
+BlockManagerMessages.TriggerThreadDump$
+BlockManagerMessages.UpdateBlockInfo
+BlockManagerMessages.UpdateBlockInfo$
+BlockMatrix
+BlockNotFoundException
+BlockStatus
+BlockUpdatedInfo
+BloomFilter
+BloomFilter.Version
+BooleanParam
+BooleanType
+BoostingStrategy
+BoundedDouble
+BreezeUtil
+Broadcast
+BroadcastBlockId
+Broker
+Bucketizer
+BufferReleasingInputStream
+BytecodeUtils
+ByteType
+CalendarIntervalType
+Catalog
+CatalogImpl
+CatalystScan
+CategoricalSplit
+CausedBy
+CheckpointReader
+CheckpointState
+ChiSqSelector
+ChiSqSelector
+ChiSqSelectorModel
+ChiSqSelectorModel
+ChiSqSelectorModel.SaveLoadV1_0$
+ChiSqTest
+ChiSqTest.Method
+ChiSqTest.Method$
+ChiSqTest.NullHypothesis$
+ChiSqTestResult
+CholeskyDecomposition
+ChunkedByteBufferInputStream
+ClassificationModel
+ClassificationModel
+Classifier
+CleanAccum
+CleanBroadcast
+CleanCheckpoint
+CleanRDD
+CleanShuffle
+CleanupTask
+CleanupTaskWeakReference
+ClosureCleaner
+CoarseGrainedClusterMessages
+CoarseGrainedClusterMessages.AddWebUIFilter
+CoarseGrainedClusterMessages.AddWebUIFilter$
+CoarseGrainedClusterMessages.GetExecutorLossReason
+CoarseGrainedClusterMessages.GetExecutorLossReason$
+CoarseGrainedClusterMessages.KillExecutors
+CoarseGrainedClusterMessages.KillExecutors$
+CoarseGrainedClusterMessages.KillTask
+CoarseGrainedClusterMessages.KillTask$
+CoarseGrainedClusterMessages.LaunchTask
+CoarseGrainedClusterMessages.LaunchTask$
+CoarseGrainedClusterMessages.RegisterClusterManager
+CoarseGrainedClusterMessages.RegisterClusterManager$
+CoarseGrainedClusterMessages.RegisteredExecutor$
+CoarseGrainedClusterMessages.RegisterExecutor
+CoarseGrainedClusterMessages.RegisterExecutor$
+CoarseGrainedClusterMessages.RegisterExecutorFailed

[28/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/spark.lapply.html
--
diff --git a/site/docs/2.0.2/api/R/spark.lapply.html 
b/site/docs/2.0.2/api/R/spark.lapply.html
new file mode 100644
index 000..f337327
--- /dev/null
+++ b/site/docs/2.0.2/api/R/spark.lapply.html
@@ -0,0 +1,96 @@
+
+R: Run a function over a list of elements, distributing 
the...
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+spark.lapply {SparkR}R 
Documentation
+
+Run a function over a list of elements, distributing the computations with 
Spark
+
+Description
+
+Run a function over a list of elements, distributing the computations with 
Spark. Applies a
+function in a manner that is similar to doParallel or lapply to elements of a 
list.
+The computations are distributed using Spark. It is conceptually the same as 
the following code:
+lapply(list, func)
+
+
+
+Usage
+
+
+spark.lapply(list, func)
+
+
+
+Arguments
+
+
+list
+
+the list of elements
+
+func
+
+a function that takes one argument.
+
+
+
+
+Details
+
+Known limitations:
+
+
+
+ variable scoping and capture: compared to R's rich support for 
variable resolutions,
+the distributed nature of SparkR limits how variables are resolved at runtime. 
All the
+variables that are available through lexical scoping are embedded in the 
closure of the
+function and available as read-only variables within the function. The 
environment variables
+should be stored into temporary variables outside the function, and not 
directly accessed
+within the function.
+
+
+ loading external packages: In order to use a package, you need to load 
it inside the
+closure. For example, if you rely on the MASS module, here is how you would 
use it:
+
+
+train - function(hyperparam) {
+  library(MASS)
+  lm.ridge("y ~ x+z", data, lambda=hyperparam)
+  model
+}
+  
+
+
+
+
+Value
+
+a list of results (the exact type being determined by the function)
+
+
+
+Note
+
+spark.lapply since 2.0.0
+
+
+
+Examples
+
+## Not run: 
+##D sparkR.session()
+##D doubled - spark.lapply(1:10, function(x){2 * x})
+## End(Not run)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/spark.naiveBayes.html
--
diff --git a/site/docs/2.0.2/api/R/spark.naiveBayes.html 
b/site/docs/2.0.2/api/R/spark.naiveBayes.html
new file mode 100644
index 000..b4d60c2
--- /dev/null
+++ b/site/docs/2.0.2/api/R/spark.naiveBayes.html
@@ -0,0 +1,143 @@
+
+R: Naive Bayes Models
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+spark.naiveBayes {SparkR}R 
Documentation
+
+Naive Bayes Models
+
+Description
+
+spark.naiveBayes fits a Bernoulli naive Bayes model against a 
SparkDataFrame.
+Users can call summary to print a summary of the fitted model, 
predict to make
+predictions on new data, and write.ml/read.ml to 
save/load fitted models.
+Only categorical data is supported.
+
+
+
+Usage
+
+
+spark.naiveBayes(data, formula, ...)
+
+## S4 method for signature 'NaiveBayesModel'
+predict(object, newData)
+
+## S4 method for signature 'NaiveBayesModel'
+summary(object, ...)
+
+## S4 method for signature 'SparkDataFrame,formula'
+spark.naiveBayes(data, formula,
+  smoothing = 1, ...)
+
+## S4 method for signature 'NaiveBayesModel,character'
+write.ml(object, path,
+  overwrite = FALSE)
+
+
+
+Arguments
+
+
+data
+
+a SparkDataFrame of observations and labels for model 
fitting.
+
+formula
+
+a symbolic description of the model to be fitted. Currently only a few 
formula
+operators are supported, including '~', '.', ':', '+', and '-'.
+
+...
+
+additional argument(s) passed to the method. Currently only 
smoothing.
+
+object
+
+a naive Bayes model fitted by spark.naiveBayes.
+
+newData
+
+a SparkDataFrame for testing.
+
+smoothing
+
+smoothing parameter.
+
+path
+
+the directory where the model is saved
+
+overwrite
+
+overwrites or not if the output path already exists. Default is FALSE
+which means throw exception if the output path exists.
+
+
+
+
+Value
+
+predict returns a SparkDataFrame containing predicted labeled 
in a column named
+prediction
+
+summary returns a list containing apriori, the 
label distribution, and
+tables, conditional probabilities given the target label.
+
+spark.naiveBayes returns a fitted naive Bayes model.
+
+
+
+Note
+
+predict(NaiveBayesModel) since 2.0.0
+

[49/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/00frame_toc.html
--
diff --git a/site/docs/2.0.2/api/R/00frame_toc.html 
b/site/docs/2.0.2/api/R/00frame_toc.html
new file mode 100644
index 000..c35b36d
--- /dev/null
+++ b/site/docs/2.0.2/api/R/00frame_toc.html
@@ -0,0 +1,378 @@
+
+
+
+
+
+R Documentation of SparkR
+
+
+window.onload = function() {
+  var imgs = document.getElementsByTagName('img'), i, img;
+  for (i = 0; i < imgs.length; i++) {
+img = imgs[i];
+// center an image if it is the only element of its parent
+if (img.parentElement.childElementCount === 1)
+  img.parentElement.style.textAlign = 'center';
+  }
+};
+
+
+
+
+
+
+
+* {
+   font-family: "Trebuchet MS", "Lucida Grande", "Lucida Sans Unicode", 
"Lucida Sans", Arial, sans-serif;
+   font-size: 14px;
+}
+body {
+  padding: 0 5px; 
+  margin: 0 auto; 
+  width: 80%;
+  max-width: 60em; /* 960px */
+}
+
+h1, h2, h3, h4, h5, h6 {
+   color: #666;
+}
+h1, h2 {
+   text-align: center;
+}
+h1 {
+   font-size: x-large;
+}
+h2, h3 {
+   font-size: large;
+}
+h4, h6 {
+   font-style: italic;
+}
+h3 {
+   border-left: solid 5px #ddd;
+   padding-left: 5px;
+   font-variant: small-caps;
+}
+
+p img {
+   display: block;
+   margin: auto;
+}
+
+span, code, pre {
+   font-family: Monaco, "Lucida Console", "Courier New", Courier, 
monospace;
+}
+span.acronym {}
+span.env {
+   font-style: italic;
+}
+span.file {}
+span.option {}
+span.pkg {
+   font-weight: bold;
+}
+span.samp{}
+
+dt, p code {
+   background-color: #F7F7F7;
+}
+
+
+
+
+
+
+
+
+SparkR
+
+
+AFTSurvivalRegressionModel-class
+GeneralizedLinearRegressionModel-class
+GroupedData
+KMeansModel-class
+NaiveBayesModel-class
+SparkDataFrame
+WindowSpec
+abs
+acos
+add_months
+alias
+approxCountDistinct
+approxQuantile
+arrange
+array_contains
+as.data.frame
+ascii
+asin
+atan
+atan2
+attach
+avg
+base64
+between
+bin
+bitwiseNOT
+bround
+cache
+cacheTable
+cancelJobGroup
+cast
+cbrt
+ceil
+clearCache
+clearJobGroup
+collect
+coltypes
+column
+columnfunctions
+columns
+concat
+concat_ws
+conv
+corr
+cos
+cosh
+count
+countDistinct
+cov
+covar_pop
+crc32
+createDataFrame
+createExternalTable
+createOrReplaceTempView
+crosstab
+cume_dist
+dapply
+dapplyCollect
+date_add
+date_format
+date_sub
+datediff
+dayofmonth
+dayofyear
+decode
+dense_rank
+dim
+distinct
+drop
+dropDuplicates
+dropTempTable-deprecated
+dropTempView
+dtypes
+encode
+endsWith
+except
+exp
+explain
+explode
+expm1
+expr
+factorial
+filter
+first
+fitted
+floor
+format_number
+format_string
+freqItems
+from_unixtime
+fromutctimestamp
+gapply
+gapplyCollect
+generateAliasesForIntersectedCols
+glm
+greatest
+groupBy
+hash
+hashCode
+head
+hex
+histogram
+hour
+hypot
+ifelse
+initcap
+insertInto
+install.spark
+instr
+intersect
+is.nan
+isLocal
+join
+kurtosis
+lag
+last
+last_day
+lead
+least
+length
+levenshtein
+limit
+lit
+locate
+log
+log10
+log1p
+log2
+lower
+lpad
+ltrim
+match
+max
+md5
+mean
+merge
+min
+minute
+monotonicallyincreasingid
+month
+months_between
+mutate
+nafunctions
+nanvl
+ncol
+negate
+next_day
+nrow
+ntile
+orderBy
+otherwise
+over
+partitionBy
+percent_rank
+persist
+pivot
+pmod
+posexplode
+predict
+print.jobj
+print.structField
+print.structType
+printSchema
+quarter
+rand
+randn
+randomSplit
+rangeBetween
+rank
+rbind
+read.df
+read.jdbc
+read.json
+read.ml
+read.orc
+read.parquet
+read.text
+regexp_extract
+regexp_replace
+registerTempTable-deprecated
+rename
+repartition
+reverse
+rint
+round
+row_number
+rowsBetween
+rpad
+rtrim
+sample
+sampleBy
+saveAsTable
+schema
+sd
+second
+select
+selectExpr
+setJobGroup
+setLogLevel
+sha1
+sha2
+shiftLeft
+shiftRight
+shiftRightUnsigned
+show
+showDF
+sign
+sin
+sinh
+size
+skewness
+sort_array
+soundex
+spark.glm
+spark.kmeans
+spark.lapply
+spark.naiveBayes
+spark.survreg
+sparkR.callJMethod
+sparkR.callJStatic
+sparkR.conf
+sparkR.init-deprecated
+sparkR.newJObject
+sparkR.session
+sparkR.session.stop
+sparkR.version
+sparkRHive.init-deprecated
+sparkRSQL.init-deprecated
+sparkpartitionid
+sql
+sqrt
+startsWith
+stddev_pop
+stddev_samp
+str
+struct
+structField
+structType
+subset
+substr
+substring_index
+sum
+sumDistinct
+summarize
+summary
+tableNames
+tableToDF
+tables
+take
+tan
+tanh
+toDegrees
+toRadians
+to_date
+toutctimestamp
+translate
+trim
+unbase64
+uncacheTable
+unhex
+union
+unix_timestamp
+unpersist-methods
+upper
+var
+var_pop
+var_samp
+weekofyear
+when
+window
+windowOrderBy
+windowPartitionBy
+with
+withColumn
+write.df
+write.jdbc
+write.json
+write.ml
+write.orc
+write.parquet
+write.text
+year
+
+
+Generated with http://yihui.name/knitr;>knitr  1.14
+
+
+
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/AFTSurvivalRegressionModel-class.html

[34/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/nrow.html
--
diff --git a/site/docs/2.0.2/api/R/nrow.html b/site/docs/2.0.2/api/R/nrow.html
new file mode 100644
index 000..2626e03
--- /dev/null
+++ b/site/docs/2.0.2/api/R/nrow.html
@@ -0,0 +1,260 @@
+
+R: Returns the number of rows in a SparkDataFrame
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+nrow 
{SparkR}R Documentation
+
+Returns the number of rows in a SparkDataFrame
+
+Description
+
+Returns the number of rows in a SparkDataFrame
+
+
+
+Usage
+
+
+## S4 method for signature 'SparkDataFrame'
+count(x)
+
+## S4 method for signature 'SparkDataFrame'
+nrow(x)
+
+
+
+Arguments
+
+
+x
+
+a SparkDataFrame.
+
+
+
+
+Note
+
+count since 1.4.0
+
+nrow since 1.5.0
+
+
+
+See Also
+
+Other SparkDataFrame functions: $,
+$,SparkDataFrame-method, $-,
+$-,SparkDataFrame-method,
+select, select,
+select,SparkDataFrame,Column-method,
+select,SparkDataFrame,character-method,
+select,SparkDataFrame,list-method;
+SparkDataFrame-class; [,
+[,SparkDataFrame-method, [[,
+[[,SparkDataFrame,numericOrcharacter-method,
+subset, subset,
+subset,SparkDataFrame-method;
+agg, agg, agg,
+agg,GroupedData-method,
+agg,SparkDataFrame-method,
+summarize, summarize,
+summarize,
+summarize,GroupedData-method,
+summarize,SparkDataFrame-method;
+arrange, arrange,
+arrange,
+arrange,SparkDataFrame,Column-method,
+arrange,SparkDataFrame,character-method,
+orderBy,SparkDataFrame,characterOrColumn-method;
+as.data.frame,
+as.data.frame,SparkDataFrame-method;
+attach,
+attach,SparkDataFrame-method;
+cache, cache,
+cache,SparkDataFrame-method;
+collect, collect,
+collect,SparkDataFrame-method;
+colnames, colnames,
+colnames,SparkDataFrame-method,
+colnames-, colnames-,
+colnames-,SparkDataFrame-method,
+columns, columns,
+columns,SparkDataFrame-method,
+names,
+names,SparkDataFrame-method,
+names-,
+names-,SparkDataFrame-method;
+coltypes, coltypes,
+coltypes,SparkDataFrame-method,
+coltypes-, coltypes-,
+coltypes-,SparkDataFrame,character-method;
+createOrReplaceTempView,
+createOrReplaceTempView,
+createOrReplaceTempView,SparkDataFrame,character-method;
+dapplyCollect, dapplyCollect,
+dapplyCollect,SparkDataFrame,function-method;
+dapply, dapply,
+dapply,SparkDataFrame,function,structType-method;
+describe, describe,
+describe,
+describe,SparkDataFrame,ANY-method,
+describe,SparkDataFrame,character-method,
+describe,SparkDataFrame-method,
+summary, summary,
+summary,SparkDataFrame-method;
+dim,
+dim,SparkDataFrame-method;
+distinct, distinct,
+distinct,SparkDataFrame-method,
+unique,
+unique,SparkDataFrame-method;
+dropDuplicates,
+dropDuplicates,
+dropDuplicates,SparkDataFrame-method;
+dropna, dropna,
+dropna,SparkDataFrame-method,
+fillna, fillna,
+fillna,SparkDataFrame-method,
+na.omit, na.omit,
+na.omit,SparkDataFrame-method;
+drop, drop,
+drop, drop,ANY-method,
+drop,SparkDataFrame-method;
+dtypes, dtypes,
+dtypes,SparkDataFrame-method;
+except, except,
+except,SparkDataFrame,SparkDataFrame-method;
+explain, explain,
+explain,SparkDataFrame-method;
+filter, filter,
+filter,SparkDataFrame,characterOrColumn-method,
+where, where,
+where,SparkDataFrame,characterOrColumn-method;
+first, first,
+first,
+first,SparkDataFrame-method,
+first,characterOrColumn-method;
+gapplyCollect, gapplyCollect,
+gapplyCollect,
+gapplyCollect,GroupedData-method,
+gapplyCollect,SparkDataFrame-method;
+gapply, gapply,
+gapply,
+gapply,GroupedData-method,
+gapply,SparkDataFrame-method;
+groupBy, groupBy,
+groupBy,SparkDataFrame-method,
+group_by, group_by,
+group_by,SparkDataFrame-method;
+head,
+head,SparkDataFrame-method;
+histogram,
+histogram,SparkDataFrame,characterOrColumn-method;
+insertInto, insertInto,
+insertInto,SparkDataFrame,character-method;
+intersect, intersect,
+intersect,SparkDataFrame,SparkDataFrame-method;
+isLocal, isLocal,
+isLocal,SparkDataFrame-method;
+join,
+join,SparkDataFrame,SparkDataFrame-method;
+limit, limit,
+limit,SparkDataFrame,numeric-method;
+merge, merge,
+merge,SparkDataFrame,SparkDataFrame-method;
+mutate, mutate,
+mutate,SparkDataFrame-method,
+transform, transform,
+transform,SparkDataFrame-method;
+ncol,
+ncol,SparkDataFrame-method;
+persist, persist,
+persist,SparkDataFrame,character-method;
+printSchema, printSchema,
+printSchema,SparkDataFrame-method;
+randomSplit, randomSplit,
+randomSplit,SparkDataFrame,numeric-method;
+rbind, rbind,
+rbind,SparkDataFrame-method;
+registerTempTable,
+registerTempTable,
+registerTempTable,SparkDataFrame,character-method;
+rename, rename,
+rename,SparkDataFrame-method,
+withColumnRenamed,
+withColumnRenamed,
+withColumnRenamed,SparkDataFrame,character,character-method;

[48/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/arrange.html
--
diff --git a/site/docs/2.0.2/api/R/arrange.html 
b/site/docs/2.0.2/api/R/arrange.html
new file mode 100644
index 000..e5ac48a
--- /dev/null
+++ b/site/docs/2.0.2/api/R/arrange.html
@@ -0,0 +1,287 @@
+
+R: Arrange Rows by Variables
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+arrange 
{SparkR}R Documentation
+
+Arrange Rows by Variables
+
+Description
+
+Sort a SparkDataFrame by the specified column(s).
+
+
+
+Usage
+
+
+## S4 method for signature 'SparkDataFrame,Column'
+arrange(x, col, ...)
+
+## S4 method for signature 'SparkDataFrame,character'
+arrange(x, col, ..., decreasing = FALSE)
+
+## S4 method for signature 'SparkDataFrame,characterOrColumn'
+orderBy(x, col, ...)
+
+arrange(x, col, ...)
+
+
+
+Arguments
+
+
+x
+
+a SparkDataFrame to be sorted.
+
+col
+
+a character or Column object indicating the fields to sort on
+
+...
+
+additional sorting fields
+
+decreasing
+
+a logical argument indicating sorting order for columns when
+a character vector is specified for col
+
+
+
+
+Value
+
+A SparkDataFrame where all elements are sorted.
+
+
+
+Note
+
+arrange(SparkDataFrame, Column) since 1.4.0
+
+arrange(SparkDataFrame, character) since 1.4.0
+
+orderBy(SparkDataFrame, characterOrColumn) since 1.4.0
+
+
+
+See Also
+
+Other SparkDataFrame functions: $,
+$,SparkDataFrame-method, $-,
+$-,SparkDataFrame-method,
+select, select,
+select,SparkDataFrame,Column-method,
+select,SparkDataFrame,character-method,
+select,SparkDataFrame,list-method;
+SparkDataFrame-class; [,
+[,SparkDataFrame-method, [[,
+[[,SparkDataFrame,numericOrcharacter-method,
+subset, subset,
+subset,SparkDataFrame-method;
+agg, agg, agg,
+agg,GroupedData-method,
+agg,SparkDataFrame-method,
+summarize, summarize,
+summarize,
+summarize,GroupedData-method,
+summarize,SparkDataFrame-method;
+as.data.frame,
+as.data.frame,SparkDataFrame-method;
+attach,
+attach,SparkDataFrame-method;
+cache, cache,
+cache,SparkDataFrame-method;
+collect, collect,
+collect,SparkDataFrame-method;
+colnames, colnames,
+colnames,SparkDataFrame-method,
+colnames-, colnames-,
+colnames-,SparkDataFrame-method,
+columns, columns,
+columns,SparkDataFrame-method,
+names,
+names,SparkDataFrame-method,
+names-,
+names-,SparkDataFrame-method;
+coltypes, coltypes,
+coltypes,SparkDataFrame-method,
+coltypes-, coltypes-,
+coltypes-,SparkDataFrame,character-method;
+count,SparkDataFrame-method,
+nrow, nrow,
+nrow,SparkDataFrame-method;
+createOrReplaceTempView,
+createOrReplaceTempView,
+createOrReplaceTempView,SparkDataFrame,character-method;
+dapplyCollect, dapplyCollect,
+dapplyCollect,SparkDataFrame,function-method;
+dapply, dapply,
+dapply,SparkDataFrame,function,structType-method;
+describe, describe,
+describe,
+describe,SparkDataFrame,ANY-method,
+describe,SparkDataFrame,character-method,
+describe,SparkDataFrame-method,
+summary, summary,
+summary,SparkDataFrame-method;
+dim,
+dim,SparkDataFrame-method;
+distinct, distinct,
+distinct,SparkDataFrame-method,
+unique,
+unique,SparkDataFrame-method;
+dropDuplicates,
+dropDuplicates,
+dropDuplicates,SparkDataFrame-method;
+dropna, dropna,
+dropna,SparkDataFrame-method,
+fillna, fillna,
+fillna,SparkDataFrame-method,
+na.omit, na.omit,
+na.omit,SparkDataFrame-method;
+drop, drop,
+drop, drop,ANY-method,
+drop,SparkDataFrame-method;
+dtypes, dtypes,
+dtypes,SparkDataFrame-method;
+except, except,
+except,SparkDataFrame,SparkDataFrame-method;
+explain, explain,
+explain,SparkDataFrame-method;
+filter, filter,
+filter,SparkDataFrame,characterOrColumn-method,
+where, where,
+where,SparkDataFrame,characterOrColumn-method;
+first, first,
+first,
+first,SparkDataFrame-method,
+first,characterOrColumn-method;
+gapplyCollect, gapplyCollect,
+gapplyCollect,
+gapplyCollect,GroupedData-method,
+gapplyCollect,SparkDataFrame-method;
+gapply, gapply,
+gapply,
+gapply,GroupedData-method,
+gapply,SparkDataFrame-method;
+groupBy, groupBy,
+groupBy,SparkDataFrame-method,
+group_by, group_by,
+group_by,SparkDataFrame-method;
+head,
+head,SparkDataFrame-method;
+histogram,
+histogram,SparkDataFrame,characterOrColumn-method;
+insertInto, insertInto,
+insertInto,SparkDataFrame,character-method;
+intersect, intersect,
+intersect,SparkDataFrame,SparkDataFrame-method;
+isLocal, isLocal,
+isLocal,SparkDataFrame-method;
+join,
+join,SparkDataFrame,SparkDataFrame-method;
+limit, limit,
+limit,SparkDataFrame,numeric-method;
+merge, merge,
+merge,SparkDataFrame,SparkDataFrame-method;
+mutate, mutate,
+mutate,SparkDataFrame-method,
+transform, transform,
+transform,SparkDataFrame-method;
+ncol,
+ncol,SparkDataFrame-method;
+persist,

[40/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/gapply.html
--
diff --git a/site/docs/2.0.2/api/R/gapply.html 
b/site/docs/2.0.2/api/R/gapply.html
new file mode 100644
index 000..03d3587
--- /dev/null
+++ b/site/docs/2.0.2/api/R/gapply.html
@@ -0,0 +1,348 @@
+
+R: gapply
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+gapply 
{SparkR}R Documentation
+
+gapply
+
+Description
+
+Groups the SparkDataFrame using the specified columns and applies the R 
function to each
+group.
+
+gapply
+
+
+
+Usage
+
+
+## S4 method for signature 'SparkDataFrame'
+gapply(x, cols, func, schema)
+
+gapply(x, ...)
+
+## S4 method for signature 'GroupedData'
+gapply(x, func, schema)
+
+
+
+Arguments
+
+
+x
+
+a SparkDataFrame or GroupedData.
+
+cols
+
+grouping columns.
+
+func
+
+a function to be applied to each group partition specified by grouping
+column of the SparkDataFrame. The function func takes as argument
+a key - grouping columns and a data frame - a local R data.frame.
+The output of func is a local R data.frame.
+
+schema
+
+the schema of the resulting SparkDataFrame after the function is applied.
+The schema must match to output of func. It has to be defined for 
each
+output column with preferred output column name and corresponding data 
type.
+
+...
+
+additional argument(s) passed to the method.
+
+
+
+
+Value
+
+A SparkDataFrame.
+
+
+
+Note
+
+gapply(SparkDataFrame) since 2.0.0
+
+gapply(GroupedData) since 2.0.0
+
+
+
+See Also
+
+gapplyCollect
+
+Other SparkDataFrame functions: $,
+$,SparkDataFrame-method, $-,
+$-,SparkDataFrame-method,
+select, select,
+select,SparkDataFrame,Column-method,
+select,SparkDataFrame,character-method,
+select,SparkDataFrame,list-method;
+SparkDataFrame-class; [,
+[,SparkDataFrame-method, [[,
+[[,SparkDataFrame,numericOrcharacter-method,
+subset, subset,
+subset,SparkDataFrame-method;
+agg, agg, agg,
+agg,GroupedData-method,
+agg,SparkDataFrame-method,
+summarize, summarize,
+summarize,
+summarize,GroupedData-method,
+summarize,SparkDataFrame-method;
+arrange, arrange,
+arrange,
+arrange,SparkDataFrame,Column-method,
+arrange,SparkDataFrame,character-method,
+orderBy,SparkDataFrame,characterOrColumn-method;
+as.data.frame,
+as.data.frame,SparkDataFrame-method;
+attach,
+attach,SparkDataFrame-method;
+cache, cache,
+cache,SparkDataFrame-method;
+collect, collect,
+collect,SparkDataFrame-method;
+colnames, colnames,
+colnames,SparkDataFrame-method,
+colnames-, colnames-,
+colnames-,SparkDataFrame-method,
+columns, columns,
+columns,SparkDataFrame-method,
+names,
+names,SparkDataFrame-method,
+names-,
+names-,SparkDataFrame-method;
+coltypes, coltypes,
+coltypes,SparkDataFrame-method,
+coltypes-, coltypes-,
+coltypes-,SparkDataFrame,character-method;
+count,SparkDataFrame-method,
+nrow, nrow,
+nrow,SparkDataFrame-method;
+createOrReplaceTempView,
+createOrReplaceTempView,
+createOrReplaceTempView,SparkDataFrame,character-method;
+dapplyCollect, dapplyCollect,
+dapplyCollect,SparkDataFrame,function-method;
+dapply, dapply,
+dapply,SparkDataFrame,function,structType-method;
+describe, describe,
+describe,
+describe,SparkDataFrame,ANY-method,
+describe,SparkDataFrame,character-method,
+describe,SparkDataFrame-method,
+summary, summary,
+summary,SparkDataFrame-method;
+dim,
+dim,SparkDataFrame-method;
+distinct, distinct,
+distinct,SparkDataFrame-method,
+unique,
+unique,SparkDataFrame-method;
+dropDuplicates,
+dropDuplicates,
+dropDuplicates,SparkDataFrame-method;
+dropna, dropna,
+dropna,SparkDataFrame-method,
+fillna, fillna,
+fillna,SparkDataFrame-method,
+na.omit, na.omit,
+na.omit,SparkDataFrame-method;
+drop, drop,
+drop, drop,ANY-method,
+drop,SparkDataFrame-method;
+dtypes, dtypes,
+dtypes,SparkDataFrame-method;
+except, except,
+except,SparkDataFrame,SparkDataFrame-method;
+explain, explain,
+explain,SparkDataFrame-method;
+filter, filter,
+filter,SparkDataFrame,characterOrColumn-method,
+where, where,
+where,SparkDataFrame,characterOrColumn-method;
+first, first,
+first,
+first,SparkDataFrame-method,
+first,characterOrColumn-method;
+gapplyCollect, gapplyCollect,
+gapplyCollect,
+gapplyCollect,GroupedData-method,
+gapplyCollect,SparkDataFrame-method;
+groupBy, groupBy,
+groupBy,SparkDataFrame-method,
+group_by, group_by,
+group_by,SparkDataFrame-method;
+head,
+head,SparkDataFrame-method;
+histogram,
+histogram,SparkDataFrame,characterOrColumn-method;
+insertInto, insertInto,
+insertInto,SparkDataFrame,character-method;
+intersect, intersect,
+intersect,SparkDataFrame,SparkDataFrame-method;
+isLocal, isLocal,
+isLocal,SparkDataFrame-method;
+join,
+join,SparkDataFrame,SparkDataFrame-method;
+limit, limit,

[32/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/read.orc.html
--
diff --git a/site/docs/2.0.2/api/R/read.orc.html 
b/site/docs/2.0.2/api/R/read.orc.html
new file mode 100644
index 000..e67fa6a
--- /dev/null
+++ b/site/docs/2.0.2/api/R/read.orc.html
@@ -0,0 +1,46 @@
+
+R: Create a SparkDataFrame from an ORC file.
+
+
+
+
+read.orc 
{SparkR}R Documentation
+
+Create a SparkDataFrame from an ORC file.
+
+Description
+
+Loads an ORC file, returning the result as a SparkDataFrame.
+
+
+
+Usage
+
+
+read.orc(path)
+
+
+
+Arguments
+
+
+path
+
+Path of file to read.
+
+
+
+
+Value
+
+SparkDataFrame
+
+
+
+Note
+
+read.orc since 2.0.0
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/read.parquet.html
--
diff --git a/site/docs/2.0.2/api/R/read.parquet.html 
b/site/docs/2.0.2/api/R/read.parquet.html
new file mode 100644
index 000..0d42bcd
--- /dev/null
+++ b/site/docs/2.0.2/api/R/read.parquet.html
@@ -0,0 +1,56 @@
+
+R: Create a SparkDataFrame from a Parquet file.
+
+
+
+
+read.parquet {SparkR}R 
Documentation
+
+Create a SparkDataFrame from a Parquet file.
+
+Description
+
+Loads a Parquet file, returning the result as a SparkDataFrame.
+
+
+
+Usage
+
+
+## Default S3 method:
+read.parquet(path)
+
+## Default S3 method:
+parquetFile(...)
+
+
+
+Arguments
+
+
+path
+
+path of file to read. A vector of multiple paths is allowed.
+
+...
+
+argument(s) passed to the method.
+
+
+
+
+Value
+
+SparkDataFrame
+
+
+
+Note
+
+read.parquet since 1.6.0
+
+parquetFile since 1.4.0
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/read.text.html
--
diff --git a/site/docs/2.0.2/api/R/read.text.html 
b/site/docs/2.0.2/api/R/read.text.html
new file mode 100644
index 000..2b6d8ca
--- /dev/null
+++ b/site/docs/2.0.2/api/R/read.text.html
@@ -0,0 +1,71 @@
+
+R: Create a SparkDataFrame from a text file.
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+read.text 
{SparkR}R Documentation
+
+Create a SparkDataFrame from a text file.
+
+Description
+
+Loads text files and returns a SparkDataFrame whose schema starts with
+a string column named value, and followed by partitioned columns if
+there are any.
+
+
+
+Usage
+
+
+## Default S3 method:
+read.text(path)
+
+
+
+Arguments
+
+
+path
+
+Path of file to read. A vector of multiple paths is allowed.
+
+
+
+
+Details
+
+Each line in the text file is a new row in the resulting SparkDataFrame.
+
+
+
+Value
+
+SparkDataFrame
+
+
+
+Note
+
+read.text since 1.6.1
+
+
+
+Examples
+
+## Not run: 
+##D sparkR.session()
+##D path - path/to/file.txt
+##D df - read.text(path)
+## End(Not run)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/regexp_extract.html
--
diff --git a/site/docs/2.0.2/api/R/regexp_extract.html 
b/site/docs/2.0.2/api/R/regexp_extract.html
new file mode 100644
index 000..375ceb0
--- /dev/null
+++ b/site/docs/2.0.2/api/R/regexp_extract.html
@@ -0,0 +1,122 @@
+
+R: regexp_extract
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+regexp_extract {SparkR}R 
Documentation
+
+regexp_extract
+
+Description
+
+Extract a specific idx group identified by a Java regex, from 
the specified string column.
+If the regex did not match, or the specified group did not match, an empty 
string is returned.
+
+
+
+Usage
+
+
+## S4 method for signature 'Column,character,numeric'
+regexp_extract(x, pattern, idx)
+
+regexp_extract(x, pattern, idx)
+
+
+
+Arguments
+
+
+x
+
+a string Column.
+
+pattern
+
+a regular expression.
+
+idx
+
+a group index.
+
+
+
+
+Note
+
+regexp_extract since 1.5.0
+
+
+
+See Also
+
+Other string_funcs: ascii,
+ascii, ascii,Column-method;
+base64, base64,
+base64,Column-method;
+concat_ws, concat_ws,
+concat_ws,character,Column-method;
+concat, concat,
+concat,Column-method; decode,
+decode,
+decode,Column,character-method;
+encode, encode,
+encode,Column,character-method;
+format_number, format_number,
+format_number,Column,numeric-method;
+format_string, format_string,
+format_string,character,Column-method;

[17/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/index.html
--
diff --git a/site/docs/2.0.2/api/java/index.html 
b/site/docs/2.0.2/api/java/index.html
new file mode 100644
index 000..f0b9c05
--- /dev/null
+++ b/site/docs/2.0.2/api/java/index.html
@@ -0,0 +1,74 @@
+http://www.w3.org/TR/html4/frameset.dtd;>
+
+
+
+
+Spark 2.0.2 JavaDoc
+
+targetPage = "" + window.location.search;
+if (targetPage != "" && targetPage != "undefined")
+targetPage = targetPage.substring(1);
+if (targetPage.indexOf(":") != -1 || (targetPage != "" && 
!validURL(targetPage)))
+targetPage = "undefined";
+function validURL(url) {
+try {
+url = decodeURIComponent(url);
+}
+catch (error) {
+return false;
+}
+var pos = url.indexOf(".html");
+if (pos == -1 || pos != url.length - 5)
+return false;
+var allowNumber = false;
+var allowSep = false;
+var seenDot = false;
+for (var i = 0; i < url.length - 5; i++) {
+var ch = url.charAt(i);
+if ('a' <= ch && ch <= 'z' ||
+'A' <= ch && ch <= 'Z' ||
+ch == '$' ||
+ch == '_' ||
+ch.charCodeAt(0) > 127) {
+allowNumber = true;
+allowSep = true;
+} else if ('0' <= ch && ch <= '9'
+|| ch == '-') {
+if (!allowNumber)
+ return false;
+} else if (ch == '/' || ch == '.') {
+if (!allowSep)
+return false;
+allowNumber = false;
+allowSep = false;
+if (ch == '.')
+ seenDot = true;
+if (ch == '/' && seenDot)
+ return false;
+} else {
+return false;
+}
+}
+return true;
+}
+function loadFrames() {
+if (targetPage != "" && targetPage != "undefined")
+ top.classFrame.location = top.targetPage;
+}
+
+
+
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+Frame Alert
+This document is designed to be viewed using the frames feature. If you see 
this message, you are using a non-frame-capable web client. Link to Non-frame version.
+
+
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/lib/api-javadocs.js
--
diff --git a/site/docs/2.0.2/api/java/lib/api-javadocs.js 
b/site/docs/2.0.2/api/java/lib/api-javadocs.js
new file mode 100644
index 000..ead13d6
--- /dev/null
+++ b/site/docs/2.0.2/api/java/lib/api-javadocs.js
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/* Dynamically injected post-processing code for the API docs */
+
+$(document).ready(function() {
+  addBadges(":: AlphaComponent ::", 'Alpha 
Component');
+  addBadges(":: DeveloperApi ::", 'Developer 
API');
+  addBadges(":: Experimental ::", 'Experimental');
+});
+
+function addBadges(tag, html) {
+  var tags = $(".block:contains(" + tag + ")")
+
+  // Remove identifier tags
+  tags.each(function(index) {
+var oldHTML = $(this).html();
+var newHTML = oldHTML.replace(tag, "");
+$(this).html(newHTML);
+  });
+
+  // Add html badge tags
+  tags.each(function(index) {
+if ($(this).parent().is('td.colLast')) {
+  $(this).parent().prepend(html);
+} else if ($(this).parent('li.blockList')
+  .parent('ul.blockList')
+  .parent('div.description')
+  .parent().is('div.contentContainer')) {
+  var contentContainer = $(this).parent('li.blockList')
+.parent('ul.blockList')
+.parent('div.description')
+.parent('div.contentContainer')
+  var header = contentContainer.prev('div.header');
+  if (header.length > 0) {
+header.prepend(html);
+  } else {
+

[23/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/write.jdbc.html
--
diff --git a/site/docs/2.0.2/api/R/write.jdbc.html 
b/site/docs/2.0.2/api/R/write.jdbc.html
new file mode 100644
index 000..d357087
--- /dev/null
+++ b/site/docs/2.0.2/api/R/write.jdbc.html
@@ -0,0 +1,299 @@
+
+R: Save the content of SparkDataFrame to an external 
database...
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+write.jdbc 
{SparkR}R Documentation
+
+Save the content of SparkDataFrame to an external database table via 
JDBC.
+
+Description
+
+Save the content of the SparkDataFrame to an external database table via 
JDBC. Additional JDBC
+database connection properties can be set (...)
+
+
+
+Usage
+
+
+## S4 method for signature 'SparkDataFrame,character,character'
+write.jdbc(x, url, tableName,
+  mode = "error", ...)
+
+write.jdbc(x, url, tableName, mode = "error", ...)
+
+
+
+Arguments
+
+
+x
+
+a SparkDataFrame.
+
+url
+
+JDBC database url of the form jdbc:subprotocol:subname.
+
+tableName
+
+yhe name of the table in the external database.
+
+mode
+
+one of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by 
default).
+
+...
+
+additional JDBC database connection properties.
+
+
+
+
+Details
+
+Also, mode is used to specify the behavior of the save operation when
+data already exists in the data source. There are four modes:
+
+
+
+ append: Contents of this SparkDataFrame are expected to be appended to 
existing data.
+
+
+ overwrite: Existing data is expected to be overwritten by the contents 
of this
+SparkDataFrame.
+
+
+ error: An exception is expected to be thrown.
+
+
+ ignore: The save operation is expected to not save the contents of the 
SparkDataFrame
+and to not change the existing data.
+
+
+
+
+
+Note
+
+write.jdbc since 2.0.0
+
+
+
+See Also
+
+Other SparkDataFrame functions: $,
+$,SparkDataFrame-method, $-,
+$-,SparkDataFrame-method,
+select, select,
+select,SparkDataFrame,Column-method,
+select,SparkDataFrame,character-method,
+select,SparkDataFrame,list-method;
+SparkDataFrame-class; [,
+[,SparkDataFrame-method, [[,
+[[,SparkDataFrame,numericOrcharacter-method,
+subset, subset,
+subset,SparkDataFrame-method;
+agg, agg, agg,
+agg,GroupedData-method,
+agg,SparkDataFrame-method,
+summarize, summarize,
+summarize,
+summarize,GroupedData-method,
+summarize,SparkDataFrame-method;
+arrange, arrange,
+arrange,
+arrange,SparkDataFrame,Column-method,
+arrange,SparkDataFrame,character-method,
+orderBy,SparkDataFrame,characterOrColumn-method;
+as.data.frame,
+as.data.frame,SparkDataFrame-method;
+attach,
+attach,SparkDataFrame-method;
+cache, cache,
+cache,SparkDataFrame-method;
+collect, collect,
+collect,SparkDataFrame-method;
+colnames, colnames,
+colnames,SparkDataFrame-method,
+colnames-, colnames-,
+colnames-,SparkDataFrame-method,
+columns, columns,
+columns,SparkDataFrame-method,
+names,
+names,SparkDataFrame-method,
+names-,
+names-,SparkDataFrame-method;
+coltypes, coltypes,
+coltypes,SparkDataFrame-method,
+coltypes-, coltypes-,
+coltypes-,SparkDataFrame,character-method;
+count,SparkDataFrame-method,
+nrow, nrow,
+nrow,SparkDataFrame-method;
+createOrReplaceTempView,
+createOrReplaceTempView,
+createOrReplaceTempView,SparkDataFrame,character-method;
+dapplyCollect, dapplyCollect,
+dapplyCollect,SparkDataFrame,function-method;
+dapply, dapply,
+dapply,SparkDataFrame,function,structType-method;
+describe, describe,
+describe,
+describe,SparkDataFrame,ANY-method,
+describe,SparkDataFrame,character-method,
+describe,SparkDataFrame-method,
+summary, summary,
+summary,SparkDataFrame-method;
+dim,
+dim,SparkDataFrame-method;
+distinct, distinct,
+distinct,SparkDataFrame-method,
+unique,
+unique,SparkDataFrame-method;
+dropDuplicates,
+dropDuplicates,
+dropDuplicates,SparkDataFrame-method;
+dropna, dropna,
+dropna,SparkDataFrame-method,
+fillna, fillna,
+fillna,SparkDataFrame-method,
+na.omit, na.omit,
+na.omit,SparkDataFrame-method;
+drop, drop,
+drop, drop,ANY-method,
+drop,SparkDataFrame-method;
+dtypes, dtypes,
+dtypes,SparkDataFrame-method;
+except, except,
+except,SparkDataFrame,SparkDataFrame-method;
+explain, explain,
+explain,SparkDataFrame-method;
+filter, filter,
+filter,SparkDataFrame,characterOrColumn-method,
+where, where,
+where,SparkDataFrame,characterOrColumn-method;
+first, first,
+first,
+first,SparkDataFrame-method,
+first,characterOrColumn-method;
+gapplyCollect, gapplyCollect,
+gapplyCollect,
+gapplyCollect,GroupedData-method,
+gapplyCollect,SparkDataFrame-method;
+gapply, gapply,
+gapply,
+gapply,GroupedData-method,
+gapply,SparkDataFrame-method;
+groupBy, groupBy,
+groupBy,SparkDataFrame-method,
+group_by, group_by,

[35/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/mutate.html
--
diff --git a/site/docs/2.0.2/api/R/mutate.html 
b/site/docs/2.0.2/api/R/mutate.html
new file mode 100644
index 000..76e5ba6
--- /dev/null
+++ b/site/docs/2.0.2/api/R/mutate.html
@@ -0,0 +1,285 @@
+
+R: Mutate
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+mutate 
{SparkR}R Documentation
+
+Mutate
+
+Description
+
+Return a new SparkDataFrame with the specified columns added or replaced.
+
+
+
+Usage
+
+
+## S4 method for signature 'SparkDataFrame'
+mutate(.data, ...)
+
+## S4 method for signature 'SparkDataFrame'
+transform(`_data`, ...)
+
+mutate(.data, ...)
+
+transform(`_data`, ...)
+
+
+
+Arguments
+
+
+.data
+
+a SparkDataFrame.
+
+...
+
+additional column argument(s) each in the form name = col.
+
+_data
+
+a SparkDataFrame.
+
+
+
+
+Value
+
+A new SparkDataFrame with the new columns added or replaced.
+
+
+
+Note
+
+mutate since 1.4.0
+
+transform since 1.5.0
+
+
+
+See Also
+
+rename withColumn
+
+Other SparkDataFrame functions: $,
+$,SparkDataFrame-method, $-,
+$-,SparkDataFrame-method,
+select, select,
+select,SparkDataFrame,Column-method,
+select,SparkDataFrame,character-method,
+select,SparkDataFrame,list-method;
+SparkDataFrame-class; [,
+[,SparkDataFrame-method, [[,
+[[,SparkDataFrame,numericOrcharacter-method,
+subset, subset,
+subset,SparkDataFrame-method;
+agg, agg, agg,
+agg,GroupedData-method,
+agg,SparkDataFrame-method,
+summarize, summarize,
+summarize,
+summarize,GroupedData-method,
+summarize,SparkDataFrame-method;
+arrange, arrange,
+arrange,
+arrange,SparkDataFrame,Column-method,
+arrange,SparkDataFrame,character-method,
+orderBy,SparkDataFrame,characterOrColumn-method;
+as.data.frame,
+as.data.frame,SparkDataFrame-method;
+attach,
+attach,SparkDataFrame-method;
+cache, cache,
+cache,SparkDataFrame-method;
+collect, collect,
+collect,SparkDataFrame-method;
+colnames, colnames,
+colnames,SparkDataFrame-method,
+colnames-, colnames-,
+colnames-,SparkDataFrame-method,
+columns, columns,
+columns,SparkDataFrame-method,
+names,
+names,SparkDataFrame-method,
+names-,
+names-,SparkDataFrame-method;
+coltypes, coltypes,
+coltypes,SparkDataFrame-method,
+coltypes-, coltypes-,
+coltypes-,SparkDataFrame,character-method;
+count,SparkDataFrame-method,
+nrow, nrow,
+nrow,SparkDataFrame-method;
+createOrReplaceTempView,
+createOrReplaceTempView,
+createOrReplaceTempView,SparkDataFrame,character-method;
+dapplyCollect, dapplyCollect,
+dapplyCollect,SparkDataFrame,function-method;
+dapply, dapply,
+dapply,SparkDataFrame,function,structType-method;
+describe, describe,
+describe,
+describe,SparkDataFrame,ANY-method,
+describe,SparkDataFrame,character-method,
+describe,SparkDataFrame-method,
+summary, summary,
+summary,SparkDataFrame-method;
+dim,
+dim,SparkDataFrame-method;
+distinct, distinct,
+distinct,SparkDataFrame-method,
+unique,
+unique,SparkDataFrame-method;
+dropDuplicates,
+dropDuplicates,
+dropDuplicates,SparkDataFrame-method;
+dropna, dropna,
+dropna,SparkDataFrame-method,
+fillna, fillna,
+fillna,SparkDataFrame-method,
+na.omit, na.omit,
+na.omit,SparkDataFrame-method;
+drop, drop,
+drop, drop,ANY-method,
+drop,SparkDataFrame-method;
+dtypes, dtypes,
+dtypes,SparkDataFrame-method;
+except, except,
+except,SparkDataFrame,SparkDataFrame-method;
+explain, explain,
+explain,SparkDataFrame-method;
+filter, filter,
+filter,SparkDataFrame,characterOrColumn-method,
+where, where,
+where,SparkDataFrame,characterOrColumn-method;
+first, first,
+first,
+first,SparkDataFrame-method,
+first,characterOrColumn-method;
+gapplyCollect, gapplyCollect,
+gapplyCollect,
+gapplyCollect,GroupedData-method,
+gapplyCollect,SparkDataFrame-method;
+gapply, gapply,
+gapply,
+gapply,GroupedData-method,
+gapply,SparkDataFrame-method;
+groupBy, groupBy,
+groupBy,SparkDataFrame-method,
+group_by, group_by,
+group_by,SparkDataFrame-method;
+head,
+head,SparkDataFrame-method;
+histogram,
+histogram,SparkDataFrame,characterOrColumn-method;
+insertInto, insertInto,
+insertInto,SparkDataFrame,character-method;
+intersect, intersect,
+intersect,SparkDataFrame,SparkDataFrame-method;
+isLocal, isLocal,
+isLocal,SparkDataFrame-method;
+join,
+join,SparkDataFrame,SparkDataFrame-method;
+limit, limit,
+limit,SparkDataFrame,numeric-method;
+merge, merge,
+merge,SparkDataFrame,SparkDataFrame-method;
+ncol,
+ncol,SparkDataFrame-method;
+persist, persist,
+persist,SparkDataFrame,character-method;
+printSchema, printSchema,
+printSchema,SparkDataFrame-method;
+randomSplit, randomSplit,
+randomSplit,SparkDataFrame,numeric-method;
+rbind, rbind,
+rbind,SparkDataFrame-method;
+registerTempTable,
+registerTempTable,

[08/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/RangePartitioner.html
--
diff --git a/site/docs/2.0.2/api/java/org/apache/spark/RangePartitioner.html 
b/site/docs/2.0.2/api/java/org/apache/spark/RangePartitioner.html
new file mode 100644
index 000..21e8fd1
--- /dev/null
+++ b/site/docs/2.0.2/api/java/org/apache/spark/RangePartitioner.html
@@ -0,0 +1,390 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+RangePartitioner (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+
+org.apache.spark
+Class 
RangePartitionerK,V
+
+
+
+Object
+
+
+org.apache.spark.Partitioner
+
+
+org.apache.spark.RangePartitionerK,V
+
+
+
+
+
+
+
+
+
+All Implemented Interfaces:
+java.io.Serializable
+
+
+
+public class RangePartitionerK,V
+extends Partitioner
+A Partitioner that partitions 
sortable records by range into roughly
+ equal ranges. The ranges are determined by sampling the content of the RDD 
passed in.
+ 
+ Note that the actual number of partitions created by the RangePartitioner 
might not be the same
+ as the partitions parameter, in the case where the number of 
sampled records is less than
+ the value of partitions.
+See Also:Serialized
 Form
+
+
+
+
+
+
+
+
+
+
+
+Constructor Summary
+
+Constructors
+
+Constructor and Description
+
+
+RangePartitioner(intpartitions,
+RDD? extends scala.Product2K,Vrdd,
+booleanascending,
+scala.math.OrderingKevidence$1,
+scala.reflect.ClassTagKevidence$2)
+
+
+
+
+
+
+
+
+
+Method Summary
+
+Methods
+
+Modifier and Type
+Method and Description
+
+
+static KObject
+determineBounds(scala.collection.mutable.ArrayBufferscala.Tuple2K,Objectcandidates,
+   intpartitions,
+   scala.math.OrderingKevidence$4,
+   scala.reflect.ClassTagKevidence$5)
+Determines the bounds for range partitioning from 
candidates with weights indicating how many
+ items each represents.
+
+
+
+boolean
+equals(Objectother)
+
+
+int
+getPartition(Objectkey)
+
+
+int
+hashCode()
+
+
+int
+numPartitions()
+
+
+static 
Kscala.Tuple2Object,scala.Tuple3Object,Object,Object[]
+sketch(RDDKrdd,
+  intsampleSizePerPartition,
+  scala.reflect.ClassTagKevidence$3)
+Sketches the input RDD via reservoir sampling on each 
partition.
+
+
+
+
+
+
+
+Methods inherited from classorg.apache.spark.Partitioner
+defaultPartitioner
+
+
+
+
+
+Methods inherited from classObject
+getClass, notify, notifyAll, toString, wait, wait, wait
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Constructor Detail
+
+
+
+
+
+RangePartitioner
+publicRangePartitioner(intpartitions,
+RDD? extends scala.Product2K,Vrdd,
+booleanascending,
+scala.math.OrderingKevidence$1,
+scala.reflect.ClassTagKevidence$2)
+
+
+
+
+
+
+
+
+
+Method Detail
+
+
+
+
+
+sketch
+public 
staticKscala.Tuple2Object,scala.Tuple3Object,Object,Object[]sketch(RDDKrdd,
+   
intsampleSizePerPartition,
+   
scala.reflect.ClassTagKevidence$3)
+Sketches the input RDD via reservoir sampling on each 
partition.
+ 
+Parameters:rdd - the 
input RDD to sketchsampleSizePerPartition - max sample 
size per partitionevidence$3 - (undocumented)
+Returns:(total number of items, an 
array of (partitionId, number of items, sample))
+
+
+
+
+
+
+
+determineBounds
+public 
staticKObjectdetermineBounds(scala.collection.mutable.ArrayBufferscala.Tuple2K,Objectcandidates,
+ intpartitions,
+ scala.math.OrderingKevidence$4,
+ scala.reflect.ClassTagKevidence$5)
+Determines the bounds for range partitioning from 
candidates with weights indicating how many
+ items each represents. Usually this is 1 over the probability used to sample 
this candidate.
+ 
+Parameters:candidates - unordered 
candidates with weightspartitions - number of 
partitionsevidence$4 - 
(undocumented)evidence$5 - (undocumented)
+Returns:selected bounds
+
+
+
+
+
+
+
+numPartitions
+publicintnumPartitions()
+
+Specified by:
+numPartitionsin
 classPartitioner
+
+
+
+
+
+
+
+
+getPartition
+publicintgetPartition(Objectkey)
+

[05/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/SparkEnv.html
--
diff --git a/site/docs/2.0.2/api/java/org/apache/spark/SparkEnv.html 
b/site/docs/2.0.2/api/java/org/apache/spark/SparkEnv.html
new file mode 100644
index 000..9bcf96c
--- /dev/null
+++ b/site/docs/2.0.2/api/java/org/apache/spark/SparkEnv.html
@@ -0,0 +1,474 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+SparkEnv (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+
+org.apache.spark
+Class SparkEnv
+
+
+
+Object
+
+
+org.apache.spark.SparkEnv
+
+
+
+
+
+
+
+
+public class SparkEnv
+extends Object
+:: DeveloperApi ::
+ Holds all the runtime environment objects for a running Spark instance 
(either master or worker),
+ including the serializer, RpcEnv, block manager, map output tracker, etc. 
Currently
+ Spark code finds the SparkEnv through a global variable, so all the threads 
can access the same
+ SparkEnv. It can be accessed by SparkEnv.get (e.g. after creating a 
SparkContext).
+ 
+ NOTE: This is not intended for external use. This is exposed for Shark and 
may be made private
+   in a future release.
+
+
+
+
+
+
+
+
+
+
+
+Constructor Summary
+
+Constructors
+
+Constructor and Description
+
+
+SparkEnv(StringexecutorId,
+org.apache.spark.rpc.RpcEnvrpcEnv,
+Serializerserializer,
+SerializerclosureSerializer,
+org.apache.spark.serializer.SerializerManagerserializerManager,
+org.apache.spark.MapOutputTrackermapOutputTracker,
+org.apache.spark.shuffle.ShuffleManagershuffleManager,
+org.apache.spark.broadcast.BroadcastManagerbroadcastManager,
+org.apache.spark.storage.BlockManagerblockManager,
+org.apache.spark.SecurityManagersecurityManager,
+org.apache.spark.metrics.MetricsSystemmetricsSystem,
+org.apache.spark.memory.MemoryManagermemoryManager,
+
org.apache.spark.scheduler.OutputCommitCoordinatoroutputCommitCoordinator,
+SparkConfconf)
+
+
+
+
+
+
+
+
+
+Method Summary
+
+Methods
+
+Modifier and Type
+Method and Description
+
+
+org.apache.spark.storage.BlockManager
+blockManager()
+
+
+org.apache.spark.broadcast.BroadcastManager
+broadcastManager()
+
+
+Serializer
+closureSerializer()
+
+
+SparkConf
+conf()
+
+
+String
+executorId()
+
+
+static SparkEnv
+get()
+Returns the SparkEnv.
+
+
+
+org.apache.spark.MapOutputTracker
+mapOutputTracker()
+
+
+org.apache.spark.memory.MemoryManager
+memoryManager()
+
+
+org.apache.spark.metrics.MetricsSystem
+metricsSystem()
+
+
+org.apache.spark.scheduler.OutputCommitCoordinator
+outputCommitCoordinator()
+
+
+org.apache.spark.SecurityManager
+securityManager()
+
+
+Serializer
+serializer()
+
+
+org.apache.spark.serializer.SerializerManager
+serializerManager()
+
+
+static void
+set(SparkEnve)
+
+
+org.apache.spark.shuffle.ShuffleManager
+shuffleManager()
+
+
+
+
+
+
+Methods inherited from classObject
+equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, 
wait
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Constructor Detail
+
+
+
+
+
+SparkEnv
+publicSparkEnv(StringexecutorId,
+org.apache.spark.rpc.RpcEnvrpcEnv,
+Serializerserializer,
+SerializerclosureSerializer,
+org.apache.spark.serializer.SerializerManagerserializerManager,
+org.apache.spark.MapOutputTrackermapOutputTracker,
+org.apache.spark.shuffle.ShuffleManagershuffleManager,
+org.apache.spark.broadcast.BroadcastManagerbroadcastManager,
+org.apache.spark.storage.BlockManagerblockManager,
+org.apache.spark.SecurityManagersecurityManager,
+org.apache.spark.metrics.MetricsSystemmetricsSystem,
+org.apache.spark.memory.MemoryManagermemoryManager,
+
org.apache.spark.scheduler.OutputCommitCoordinatoroutputCommitCoordinator,
+SparkConfconf)
+
+
+
+
+
+
+
+
+
+Method Detail
+
+
+
+
+
+set
+public staticvoidset(SparkEnve)
+
+
+
+
+
+
+
+get
+public staticSparkEnvget()
+Returns the SparkEnv.
+Returns:(undocumented)
+
+
+
+
+
+
+
+executorId
+publicStringexecutorId()
+
+
+
+
+
+
+
+serializer
+publicSerializerserializer()
+
+
+
+
+
+
+
+closureSerializer
+publicSerializerclosureSerializer()
+
+
+
+
+
+
+
+serializerManager

[43/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/dim.html
--
diff --git a/site/docs/2.0.2/api/R/dim.html b/site/docs/2.0.2/api/R/dim.html
new file mode 100644
index 000..1227bc6
--- /dev/null
+++ b/site/docs/2.0.2/api/R/dim.html
@@ -0,0 +1,256 @@
+
+R: Returns the dimensions of SparkDataFrame
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+dim 
{SparkR}R Documentation
+
+Returns the dimensions of SparkDataFrame
+
+Description
+
+Returns the dimensions (number of rows and columns) of a SparkDataFrame
+
+
+
+Usage
+
+
+## S4 method for signature 'SparkDataFrame'
+dim(x)
+
+
+
+Arguments
+
+
+x
+
+a SparkDataFrame
+
+
+
+
+Note
+
+dim since 1.5.0
+
+
+
+See Also
+
+Other SparkDataFrame functions: $,
+$,SparkDataFrame-method, $-,
+$-,SparkDataFrame-method,
+select, select,
+select,SparkDataFrame,Column-method,
+select,SparkDataFrame,character-method,
+select,SparkDataFrame,list-method;
+SparkDataFrame-class; [,
+[,SparkDataFrame-method, [[,
+[[,SparkDataFrame,numericOrcharacter-method,
+subset, subset,
+subset,SparkDataFrame-method;
+agg, agg, agg,
+agg,GroupedData-method,
+agg,SparkDataFrame-method,
+summarize, summarize,
+summarize,
+summarize,GroupedData-method,
+summarize,SparkDataFrame-method;
+arrange, arrange,
+arrange,
+arrange,SparkDataFrame,Column-method,
+arrange,SparkDataFrame,character-method,
+orderBy,SparkDataFrame,characterOrColumn-method;
+as.data.frame,
+as.data.frame,SparkDataFrame-method;
+attach,
+attach,SparkDataFrame-method;
+cache, cache,
+cache,SparkDataFrame-method;
+collect, collect,
+collect,SparkDataFrame-method;
+colnames, colnames,
+colnames,SparkDataFrame-method,
+colnames-, colnames-,
+colnames-,SparkDataFrame-method,
+columns, columns,
+columns,SparkDataFrame-method,
+names,
+names,SparkDataFrame-method,
+names-,
+names-,SparkDataFrame-method;
+coltypes, coltypes,
+coltypes,SparkDataFrame-method,
+coltypes-, coltypes-,
+coltypes-,SparkDataFrame,character-method;
+count,SparkDataFrame-method,
+nrow, nrow,
+nrow,SparkDataFrame-method;
+createOrReplaceTempView,
+createOrReplaceTempView,
+createOrReplaceTempView,SparkDataFrame,character-method;
+dapplyCollect, dapplyCollect,
+dapplyCollect,SparkDataFrame,function-method;
+dapply, dapply,
+dapply,SparkDataFrame,function,structType-method;
+describe, describe,
+describe,
+describe,SparkDataFrame,ANY-method,
+describe,SparkDataFrame,character-method,
+describe,SparkDataFrame-method,
+summary, summary,
+summary,SparkDataFrame-method;
+distinct, distinct,
+distinct,SparkDataFrame-method,
+unique,
+unique,SparkDataFrame-method;
+dropDuplicates,
+dropDuplicates,
+dropDuplicates,SparkDataFrame-method;
+dropna, dropna,
+dropna,SparkDataFrame-method,
+fillna, fillna,
+fillna,SparkDataFrame-method,
+na.omit, na.omit,
+na.omit,SparkDataFrame-method;
+drop, drop,
+drop, drop,ANY-method,
+drop,SparkDataFrame-method;
+dtypes, dtypes,
+dtypes,SparkDataFrame-method;
+except, except,
+except,SparkDataFrame,SparkDataFrame-method;
+explain, explain,
+explain,SparkDataFrame-method;
+filter, filter,
+filter,SparkDataFrame,characterOrColumn-method,
+where, where,
+where,SparkDataFrame,characterOrColumn-method;
+first, first,
+first,
+first,SparkDataFrame-method,
+first,characterOrColumn-method;
+gapplyCollect, gapplyCollect,
+gapplyCollect,
+gapplyCollect,GroupedData-method,
+gapplyCollect,SparkDataFrame-method;
+gapply, gapply,
+gapply,
+gapply,GroupedData-method,
+gapply,SparkDataFrame-method;
+groupBy, groupBy,
+groupBy,SparkDataFrame-method,
+group_by, group_by,
+group_by,SparkDataFrame-method;
+head,
+head,SparkDataFrame-method;
+histogram,
+histogram,SparkDataFrame,characterOrColumn-method;
+insertInto, insertInto,
+insertInto,SparkDataFrame,character-method;
+intersect, intersect,
+intersect,SparkDataFrame,SparkDataFrame-method;
+isLocal, isLocal,
+isLocal,SparkDataFrame-method;
+join,
+join,SparkDataFrame,SparkDataFrame-method;
+limit, limit,
+limit,SparkDataFrame,numeric-method;
+merge, merge,
+merge,SparkDataFrame,SparkDataFrame-method;
+mutate, mutate,
+mutate,SparkDataFrame-method,
+transform, transform,
+transform,SparkDataFrame-method;
+ncol,
+ncol,SparkDataFrame-method;
+persist, persist,
+persist,SparkDataFrame,character-method;
+printSchema, printSchema,
+printSchema,SparkDataFrame-method;
+randomSplit, randomSplit,
+randomSplit,SparkDataFrame,numeric-method;
+rbind, rbind,
+rbind,SparkDataFrame-method;
+registerTempTable,
+registerTempTable,
+registerTempTable,SparkDataFrame,character-method;
+rename, rename,
+rename,SparkDataFrame-method,
+withColumnRenamed,
+withColumnRenamed,
+withColumnRenamed,SparkDataFrame,character,character-method;
+repartition, repartition,

[31/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/round.html
--
diff --git a/site/docs/2.0.2/api/R/round.html b/site/docs/2.0.2/api/R/round.html
new file mode 100644
index 000..fd2fb17
--- /dev/null
+++ b/site/docs/2.0.2/api/R/round.html
@@ -0,0 +1,120 @@
+
+R: round
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+round 
{SparkR}R Documentation
+
+round
+
+Description
+
+Returns the value of the column e rounded to 0 decimal places 
using HALF_UP rounding mode.
+
+
+
+Usage
+
+
+## S4 method for signature 'Column'
+round(x)
+
+
+
+Arguments
+
+
+x
+
+Column to compute on.
+
+
+
+
+Note
+
+round since 1.5.0
+
+
+
+See Also
+
+Other math_funcs: acos,
+acos,Column-method; asin,
+asin,Column-method; atan2,
+atan2,Column-method; atan,
+atan,Column-method; bin,
+bin, bin,Column-method;
+bround, bround,
+bround,Column-method; cbrt,
+cbrt, cbrt,Column-method;
+ceil, ceil,
+ceil,Column-method, ceiling,
+ceiling,Column-method; conv,
+conv,
+conv,Column,numeric,numeric-method;
+corr, corr,
+corr, corr,Column-method,
+corr,SparkDataFrame-method;
+cosh, cosh,Column-method;
+cos, cos,Column-method;
+covar_pop, covar_pop,
+covar_pop,characterOrColumn,characterOrColumn-method;
+cov, cov, cov,
+cov,SparkDataFrame-method,
+cov,characterOrColumn-method,
+covar_samp, covar_samp,
+covar_samp,characterOrColumn,characterOrColumn-method;
+expm1, expm1,Column-method;
+exp, exp,Column-method;
+factorial,
+factorial,Column-method;
+floor, floor,Column-method;
+hex, hex,
+hex,Column-method; hypot,
+hypot, hypot,Column-method;
+log10, log10,Column-method;
+log1p, log1p,Column-method;
+log2, log2,Column-method;
+log, log,Column-method;
+pmod, pmod,
+pmod,Column-method; rint,
+rint, rint,Column-method;
+shiftLeft, shiftLeft,
+shiftLeft,Column,numeric-method;
+shiftRightUnsigned,
+shiftRightUnsigned,
+shiftRightUnsigned,Column,numeric-method;
+shiftRight, shiftRight,
+shiftRight,Column,numeric-method;
+sign, sign,Column-method,
+signum, signum,
+signum,Column-method; sinh,
+sinh,Column-method; sin,
+sin,Column-method; sqrt,
+sqrt,Column-method; tanh,
+tanh,Column-method; tan,
+tan,Column-method; toDegrees,
+toDegrees,
+toDegrees,Column-method;
+toRadians, toRadians,
+toRadians,Column-method;
+unhex, unhex,
+unhex,Column-method
+
+
+
+Examples
+
+## Not run: round(df$c)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/row_number.html
--
diff --git a/site/docs/2.0.2/api/R/row_number.html 
b/site/docs/2.0.2/api/R/row_number.html
new file mode 100644
index 000..d3fd4fb
--- /dev/null
+++ b/site/docs/2.0.2/api/R/row_number.html
@@ -0,0 +1,86 @@
+
+R: row_number
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+row_number 
{SparkR}R Documentation
+
+row_number
+
+Description
+
+Window function: returns a sequential number starting at 1 within a window 
partition.
+
+
+
+Usage
+
+
+## S4 method for signature 'missing'
+row_number()
+
+row_number(x = "missing")
+
+
+
+Arguments
+
+
+x
+
+empty. Should be used with no argument.
+
+
+
+
+Details
+
+This is equivalent to the ROW_NUMBER function in SQL.
+
+
+
+Note
+
+row_number since 1.6.0
+
+
+
+See Also
+
+Other window_funcs: cume_dist,
+cume_dist,
+cume_dist,missing-method;
+dense_rank, dense_rank,
+dense_rank,missing-method;
+lag, lag,
+lag,characterOrColumn-method;
+lead, lead,
+lead,characterOrColumn,numeric-method;
+ntile, ntile,
+ntile,numeric-method;
+percent_rank, percent_rank,
+percent_rank,missing-method;
+rank, rank,
+rank, rank,ANY-method,
+rank,missing-method
+
+
+
+Examples
+
+## Not run: 
+##D   df - createDataFrame(mtcars)
+##D   ws - orderBy(windowPartitionBy(am), hp)
+##D   out - select(df, over(row_number(), ws), df$hp, df$am)
+## End(Not run)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/rowsBetween.html
--
diff --git a/site/docs/2.0.2/api/R/rowsBetween.html 
b/site/docs/2.0.2/api/R/rowsBetween.html
new file mode 100644
index 000..571df1a
--- /dev/null
+++ b/site/docs/2.0.2/api/R/rowsBetween.html
@@ -0,0 +1,94 @@
+
+R: rowsBetween
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>

[25/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/unhex.html
--
diff --git a/site/docs/2.0.2/api/R/unhex.html b/site/docs/2.0.2/api/R/unhex.html
new file mode 100644
index 000..208ff4c
--- /dev/null
+++ b/site/docs/2.0.2/api/R/unhex.html
@@ -0,0 +1,122 @@
+
+R: unhex
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+unhex 
{SparkR}R Documentation
+
+unhex
+
+Description
+
+Inverse of hex. Interprets each pair of characters as a hexadecimal number
+and converts to the byte representation of number.
+
+
+
+Usage
+
+
+## S4 method for signature 'Column'
+unhex(x)
+
+unhex(x)
+
+
+
+Arguments
+
+
+x
+
+Column to compute on.
+
+
+
+
+Note
+
+unhex since 1.5.0
+
+
+
+See Also
+
+Other math_funcs: acos,
+acos,Column-method; asin,
+asin,Column-method; atan2,
+atan2,Column-method; atan,
+atan,Column-method; bin,
+bin, bin,Column-method;
+bround, bround,
+bround,Column-method; cbrt,
+cbrt, cbrt,Column-method;
+ceil, ceil,
+ceil,Column-method, ceiling,
+ceiling,Column-method; conv,
+conv,
+conv,Column,numeric,numeric-method;
+corr, corr,
+corr, corr,Column-method,
+corr,SparkDataFrame-method;
+cosh, cosh,Column-method;
+cos, cos,Column-method;
+covar_pop, covar_pop,
+covar_pop,characterOrColumn,characterOrColumn-method;
+cov, cov, cov,
+cov,SparkDataFrame-method,
+cov,characterOrColumn-method,
+covar_samp, covar_samp,
+covar_samp,characterOrColumn,characterOrColumn-method;
+expm1, expm1,Column-method;
+exp, exp,Column-method;
+factorial,
+factorial,Column-method;
+floor, floor,Column-method;
+hex, hex,
+hex,Column-method; hypot,
+hypot, hypot,Column-method;
+log10, log10,Column-method;
+log1p, log1p,Column-method;
+log2, log2,Column-method;
+log, log,Column-method;
+pmod, pmod,
+pmod,Column-method; rint,
+rint, rint,Column-method;
+round, round,Column-method;
+shiftLeft, shiftLeft,
+shiftLeft,Column,numeric-method;
+shiftRightUnsigned,
+shiftRightUnsigned,
+shiftRightUnsigned,Column,numeric-method;
+shiftRight, shiftRight,
+shiftRight,Column,numeric-method;
+sign, sign,Column-method,
+signum, signum,
+signum,Column-method; sinh,
+sinh,Column-method; sin,
+sin,Column-method; sqrt,
+sqrt,Column-method; tanh,
+tanh,Column-method; tan,
+tan,Column-method; toDegrees,
+toDegrees,
+toDegrees,Column-method;
+toRadians, toRadians,
+toRadians,Column-method
+
+
+
+Examples
+
+## Not run: unhex(df$c)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/union.html
--
diff --git a/site/docs/2.0.2/api/R/union.html b/site/docs/2.0.2/api/R/union.html
new file mode 100644
index 000..8e92124
--- /dev/null
+++ b/site/docs/2.0.2/api/R/union.html
@@ -0,0 +1,280 @@
+
+R: Return a new SparkDataFrame containing the union of 
rows
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+union 
{SparkR}R Documentation
+
+Return a new SparkDataFrame containing the union of rows
+
+Description
+
+Return a new SparkDataFrame containing the union of rows in this 
SparkDataFrame
+and another SparkDataFrame. This is equivalent to UNION ALL in 
SQL.
+Note that this does not remove duplicate rows across the two SparkDataFrames.
+
+unionAll is deprecated - use union instead
+
+
+
+Usage
+
+
+## S4 method for signature 'SparkDataFrame,SparkDataFrame'
+union(x, y)
+
+## S4 method for signature 'SparkDataFrame,SparkDataFrame'
+unionAll(x, y)
+
+union(x, y)
+
+unionAll(x, y)
+
+
+
+Arguments
+
+
+x
+
+A SparkDataFrame
+
+y
+
+A SparkDataFrame
+
+
+
+
+Value
+
+A SparkDataFrame containing the result of the union.
+
+
+
+Note
+
+union since 2.0.0
+
+unionAll since 1.4.0
+
+
+
+See Also
+
+rbind
+
+Other SparkDataFrame functions: $,
+$,SparkDataFrame-method, $-,
+$-,SparkDataFrame-method,
+select, select,
+select,SparkDataFrame,Column-method,
+select,SparkDataFrame,character-method,
+select,SparkDataFrame,list-method;
+SparkDataFrame-class; [,
+[,SparkDataFrame-method, [[,
+[[,SparkDataFrame,numericOrcharacter-method,
+subset, subset,
+subset,SparkDataFrame-method;
+agg, agg, agg,
+agg,GroupedData-method,
+agg,SparkDataFrame-method,
+summarize, summarize,
+summarize,
+summarize,GroupedData-method,
+summarize,SparkDataFrame-method;
+arrange, arrange,
+arrange,
+arrange,SparkDataFrame,Column-method,
+arrange,SparkDataFrame,character-method,
+orderBy,SparkDataFrame,characterOrColumn-method;
+as.data.frame,

[06/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/SparkContext.html
--
diff --git a/site/docs/2.0.2/api/java/org/apache/spark/SparkContext.html 
b/site/docs/2.0.2/api/java/org/apache/spark/SparkContext.html
new file mode 100644
index 000..09f31f9
--- /dev/null
+++ b/site/docs/2.0.2/api/java/org/apache/spark/SparkContext.html
@@ -0,0 +1,2467 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+SparkContext (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+
+org.apache.spark
+Class SparkContext
+
+
+
+Object
+
+
+org.apache.spark.SparkContext
+
+
+
+
+
+
+
+
+public class SparkContext
+extends Object
+Main entry point for Spark functionality. A SparkContext 
represents the connection to a Spark
+ cluster, and can be used to create RDDs, accumulators and broadcast variables 
on that cluster.
+ 
+ Only one SparkContext may be active per JVM.  You must stop() 
the active SparkContext before
+ creating a new one.  This limitation may eventually be removed; see 
SPARK-2243 for more details.
+ 
+ param:  config a Spark Config object describing the application 
configuration. Any settings in
+   this config overrides the default configs as well as system 
properties.
+
+
+
+
+
+
+
+
+
+
+
+Constructor Summary
+
+Constructors
+
+Constructor and Description
+
+
+SparkContext()
+Create a SparkContext that loads settings from system 
properties (for instance, when
+ launching with ./bin/spark-submit).
+
+
+
+SparkContext(SparkConfconfig)
+
+
+SparkContext(Stringmaster,
+StringappName,
+SparkConfconf)
+Alternative constructor that allows setting common Spark 
properties directly
+
+
+
+SparkContext(Stringmaster,
+StringappName,
+StringsparkHome,
+scala.collection.SeqStringjars,
+scala.collection.MapString,Stringenvironment)
+Alternative constructor that allows setting common Spark 
properties directly
+
+
+
+
+
+
+
+
+
+
+Method Summary
+
+Methods
+
+Modifier and Type
+Method and Description
+
+
+R,TAccumulableR,T
+accumulable(RinitialValue,
+   AccumulableParamR,Tparam)
+Deprecated.
+use AccumulatorV2. Since 2.0.0.
+
+
+
+
+R,TAccumulableR,T
+accumulable(RinitialValue,
+   Stringname,
+   AccumulableParamR,Tparam)
+Deprecated.
+use AccumulatorV2. Since 2.0.0.
+
+
+
+
+R,TAccumulableR,T
+accumulableCollection(RinitialValue,
+ 
scala.Function1R,scala.collection.generic.GrowableTevidence$9,
+ scala.reflect.ClassTagRevidence$10)
+Deprecated.
+use AccumulatorV2. Since 2.0.0.
+
+
+
+
+TAccumulatorT
+accumulator(TinitialValue,
+   AccumulatorParamTparam)
+Deprecated.
+use AccumulatorV2. Since 2.0.0.
+
+
+
+
+TAccumulatorT
+accumulator(TinitialValue,
+   Stringname,
+   AccumulatorParamTparam)
+Deprecated.
+use AccumulatorV2. Since 2.0.0.
+
+
+
+
+void
+addFile(Stringpath)
+Add a file to be downloaded with this Spark job on every 
node.
+
+
+
+void
+addFile(Stringpath,
+   booleanrecursive)
+Add a file to be downloaded with this Spark job on every 
node.
+
+
+
+void
+addJar(Stringpath)
+Adds a JAR dependency for all tasks to be executed on this 
SparkContext in the future.
+
+
+
+void
+addSparkListener(org.apache.spark.scheduler.SparkListenerInterfacelistener)
+:: DeveloperApi ::
+ Register a listener to receive up-calls from events that happen during 
execution.
+
+
+
+scala.OptionString
+applicationAttemptId()
+
+
+String
+applicationId()
+A unique identifier for the Spark application.
+
+
+
+String
+appName()
+
+
+RDDscala.Tuple2String,PortableDataStream
+binaryFiles(Stringpath,
+   intminPartitions)
+Get an RDD for a Hadoop-readable dataset as 
PortableDataStream for each file
+ (useful for binary data)
+
+
+
+RDDbyte[]
+binaryRecords(Stringpath,
+ intrecordLength,
+ org.apache.hadoop.conf.Configurationconf)
+Load data from a flat binary file, assuming the length of 
each record is constant.
+
+
+
+TBroadcastT
+broadcast(Tvalue,
+ scala.reflect.ClassTagTevidence$11)
+Broadcast a read-only variable to the cluster, returning a
+ Broadcast object for reading it in 
distributed functions.
+
+
+
+void
+cancelAllJobs()
+Cancel all jobs that have been scheduled or are

[45/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/corr.html
--
diff --git a/site/docs/2.0.2/api/R/corr.html b/site/docs/2.0.2/api/R/corr.html
new file mode 100644
index 000..9f058a4
--- /dev/null
+++ b/site/docs/2.0.2/api/R/corr.html
@@ -0,0 +1,177 @@
+
+R: corr
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+corr 
{SparkR}R Documentation
+
+corr
+
+Description
+
+Computes the Pearson Correlation Coefficient for two Columns.
+
+Calculates the correlation of two columns of a SparkDataFrame.
+Currently only supports the Pearson Correlation Coefficient.
+For Spearman Correlation, consider using RDD methods found in MLlib's 
Statistics.
+
+
+
+Usage
+
+
+## S4 method for signature 'Column'
+corr(x, col2)
+
+corr(x, ...)
+
+## S4 method for signature 'SparkDataFrame'
+corr(x, colName1, colName2, method = "pearson")
+
+
+
+Arguments
+
+
+x
+
+a Column or a SparkDataFrame.
+
+col2
+
+a (second) Column.
+
+...
+
+additional argument(s). If x is a Column, a Column
+should be provided. If x is a SparkDataFrame, two column names 
should
+be provided.
+
+colName1
+
+the name of the first column
+
+colName2
+
+the name of the second column
+
+method
+
+Optional. A character specifying the method for calculating the correlation.
+only pearson is allowed now.
+
+
+
+
+Value
+
+The Pearson Correlation Coefficient as a Double.
+
+
+
+Note
+
+corr since 1.6.0
+
+corr since 1.6.0
+
+
+
+See Also
+
+Other math_funcs: acos,
+acos,Column-method; asin,
+asin,Column-method; atan2,
+atan2,Column-method; atan,
+atan,Column-method; bin,
+bin, bin,Column-method;
+bround, bround,
+bround,Column-method; cbrt,
+cbrt, cbrt,Column-method;
+ceil, ceil,
+ceil,Column-method, ceiling,
+ceiling,Column-method; conv,
+conv,
+conv,Column,numeric,numeric-method;
+cosh, cosh,Column-method;
+cos, cos,Column-method;
+covar_pop, covar_pop,
+covar_pop,characterOrColumn,characterOrColumn-method;
+cov, cov, cov,
+cov,SparkDataFrame-method,
+cov,characterOrColumn-method,
+covar_samp, covar_samp,
+covar_samp,characterOrColumn,characterOrColumn-method;
+expm1, expm1,Column-method;
+exp, exp,Column-method;
+factorial,
+factorial,Column-method;
+floor, floor,Column-method;
+hex, hex,
+hex,Column-method; hypot,
+hypot, hypot,Column-method;
+log10, log10,Column-method;
+log1p, log1p,Column-method;
+log2, log2,Column-method;
+log, log,Column-method;
+pmod, pmod,
+pmod,Column-method; rint,
+rint, rint,Column-method;
+round, round,Column-method;
+shiftLeft, shiftLeft,
+shiftLeft,Column,numeric-method;
+shiftRightUnsigned,
+shiftRightUnsigned,
+shiftRightUnsigned,Column,numeric-method;
+shiftRight, shiftRight,
+shiftRight,Column,numeric-method;
+sign, sign,Column-method,
+signum, signum,
+signum,Column-method; sinh,
+sinh,Column-method; sin,
+sin,Column-method; sqrt,
+sqrt,Column-method; tanh,
+tanh,Column-method; tan,
+tan,Column-method; toDegrees,
+toDegrees,
+toDegrees,Column-method;
+toRadians, toRadians,
+toRadians,Column-method;
+unhex, unhex,
+unhex,Column-method
+
+Other stat functions: approxQuantile,
+approxQuantile,SparkDataFrame,character,numeric,numeric-method;
+cov, cov, cov,
+cov,SparkDataFrame-method,
+cov,characterOrColumn-method,
+covar_samp, covar_samp,
+covar_samp,characterOrColumn,characterOrColumn-method;
+crosstab,
+crosstab,SparkDataFrame,character,character-method;
+freqItems,
+freqItems,SparkDataFrame,character-method;
+sampleBy, sampleBy,
+sampleBy,SparkDataFrame,character,list,numeric-method
+
+
+
+Examples
+
+## Not run: corr(df$c, df$d)
+## Not run: 
+##D df - read.json(/path/to/file.json)
+##D corr - corr(df, title, gender)
+##D corr - corr(df, title, gender, method = 
pearson)
+## End(Not run)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/cos.html
--
diff --git a/site/docs/2.0.2/api/R/cos.html b/site/docs/2.0.2/api/R/cos.html
new file mode 100644
index 000..64090a4
--- /dev/null
+++ b/site/docs/2.0.2/api/R/cos.html
@@ -0,0 +1,120 @@
+
+R: cos
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+cos 
{SparkR}R Documentation
+
+cos
+
+Description
+
+Computes the cosine of the given value.
+
+
+
+Usage
+
+
+## S4 method for signature 'Column'
+cos(x)
+
+
+
+Arguments
+
+
+x
+
+Column to compute on.
+
+
+
+
+Note
+
+cos since 1.5.0
+
+
+
+See Also
+
+Other math_funcs: acos,

[07/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/SparkConf.html
--
diff --git a/site/docs/2.0.2/api/java/org/apache/spark/SparkConf.html 
b/site/docs/2.0.2/api/java/org/apache/spark/SparkConf.html
new file mode 100644
index 000..8fb676c
--- /dev/null
+++ b/site/docs/2.0.2/api/java/org/apache/spark/SparkConf.html
@@ -0,0 +1,1124 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+SparkConf (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+
+org.apache.spark
+Class SparkConf
+
+
+
+Object
+
+
+org.apache.spark.SparkConf
+
+
+
+
+
+
+
+All Implemented Interfaces:
+Cloneable
+
+
+
+public class SparkConf
+extends Object
+implements scala.Cloneable
+Configuration for a Spark application. Used to set various 
Spark parameters as key-value pairs.
+ 
+ Most of the time, you would create a SparkConf object with new 
SparkConf(), which will load
+ values from any spark.* Java system properties set in your 
application as well. In this case,
+ parameters you set directly on the SparkConf object take 
priority over system properties.
+ 
+ For unit tests, you can also call new SparkConf(false) to skip 
loading external settings and
+ get the same configuration no matter what the system properties are.
+ 
+ All setter methods in this class support chaining. For example, you can write
+ new SparkConf().setMaster("local").setAppName("My app").
+ 
+ Note that once a SparkConf object is passed to Spark, it is cloned and can no 
longer be modified
+ by the user. Spark does not support modifying the configuration at runtime.
+ 
+ param:  loadDefaults whether to also load values from Java system 
properties
+
+
+
+
+
+
+
+
+
+
+
+Constructor Summary
+
+Constructors
+
+Constructor and Description
+
+
+SparkConf()
+Create a SparkConf that loads defaults from system 
properties and the classpath
+
+
+
+SparkConf(booleanloadDefaults)
+
+
+
+
+
+
+
+
+
+Method Summary
+
+Methods
+
+Modifier and Type
+Method and Description
+
+
+SparkConf
+clone()
+Copy this object
+
+
+
+boolean
+contains(Stringkey)
+Does the configuration contain a given parameter?
+
+
+
+String
+get(Stringkey)
+Get a parameter; throws a NoSuchElementException if it's 
not set
+
+
+
+String
+get(Stringkey,
+   StringdefaultValue)
+Get a parameter, falling back to a default if not set
+
+
+
+scala.Tuple2String,String[]
+getAll()
+Get all parameters as a list of pairs
+
+
+
+String
+getAppId()
+Returns the Spark application id, valid in the Driver after 
TaskScheduler registration and
+ from the start in the Executor.
+
+
+
+scala.collection.immutable.MapObject,String
+getAvroSchema()
+Gets all the avro schemas in the configuration used in the 
generic Avro record serializer
+
+
+
+boolean
+getBoolean(Stringkey,
+  booleandefaultValue)
+Get a parameter as a boolean, falling back to a default if 
not set
+
+
+
+static scala.OptionString
+getDeprecatedConfig(Stringkey,
+   SparkConfconf)
+Looks for available deprecated keys for the given config 
option, and return the first
+ value available.
+
+
+
+double
+getDouble(Stringkey,
+ doubledefaultValue)
+Get a parameter as a double, falling back to a default if 
not set
+
+
+
+scala.collection.Seqscala.Tuple2String,String
+getExecutorEnv()
+Get all executor environment variables set on this 
SparkConf
+
+
+
+int
+getInt(Stringkey,
+  intdefaultValue)
+Get a parameter as an integer, falling back to a default if 
not set
+
+
+
+long
+getLong(Stringkey,
+   longdefaultValue)
+Get a parameter as a long, falling back to a default if not 
set
+
+
+
+scala.OptionString
+getOption(Stringkey)
+Get a parameter as an Option
+
+
+
+long
+getSizeAsBytes(Stringkey)
+Get a size parameter as bytes; throws a 
NoSuchElementException if it's not set.
+
+
+
+long
+getSizeAsBytes(Stringkey,
+  longdefaultValue)
+Get a size parameter as bytes, falling back to a default if 
not set.
+
+
+
+long
+getSizeAsBytes(Stringkey,
+  StringdefaultValue)
+Get a size parameter as bytes, falling back to a default if 
not set.
+
+
+
+long
+getSizeAsGb(Stringkey)
+Get a size parameter as Gibibytes; throws a 
NoSuchElementException if it's not set.
+
+
+
+long
+getSizeAsGb(Stringkey,
+   StringdefaultValue)
+Get a size parameter as

[14/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/AccumulatorParam.IntAccumulatorParam$.html
--
diff --git 
a/site/docs/2.0.2/api/java/org/apache/spark/AccumulatorParam.IntAccumulatorParam$.html
 
b/site/docs/2.0.2/api/java/org/apache/spark/AccumulatorParam.IntAccumulatorParam$.html
new file mode 100644
index 000..c77f232
--- /dev/null
+++ 
b/site/docs/2.0.2/api/java/org/apache/spark/AccumulatorParam.IntAccumulatorParam$.html
@@ -0,0 +1,361 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+AccumulatorParam.IntAccumulatorParam$ (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev 
Class
+Next 
Class
+
+
+Frames
+No 
Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+
+org.apache.spark
+Class 
AccumulatorParam.IntAccumulatorParam$
+
+
+
+Object
+
+
+org.apache.spark.AccumulatorParam.IntAccumulatorParam$
+
+
+
+
+
+
+
+All Implemented Interfaces:
+java.io.Serializable, AccumulableParamObject,Object, AccumulatorParamObject
+
+
+Enclosing interface:
+AccumulatorParamT
+
+
+Deprecated.
+use AccumulatorV2. Since 2.0.0.
+
+
+public static class AccumulatorParam.IntAccumulatorParam$
+extends Object
+implements AccumulatorParamObject
+See Also:Serialized
 Form
+
+
+
+
+
+
+
+
+
+
+
+Nested Class Summary
+
+
+
+
+Nested classes/interfaces inherited from 
interfaceorg.apache.spark.AccumulatorParam
+AccumulatorParam.DoubleAccumulatorParam$, 
AccumulatorParam.FloatAccumulatorParam$, 
AccumulatorParam.IntAccumulatorParam$, AccumulatorParam.LongAccumulatorParam$, 
AccumulatorParam.StringAccumulatorParam$
+
+
+
+
+
+
+
+
+Field Summary
+
+Fields
+
+Modifier and Type
+Field and Description
+
+
+static AccumulatorParam.IntAccumulatorParam$
+MODULE$
+Deprecated.
+Static reference to the singleton instance of this Scala 
object.
+
+
+
+
+
+
+
+
+
+
+Constructor Summary
+
+Constructors
+
+Constructor and Description
+
+
+AccumulatorParam.IntAccumulatorParam$()
+Deprecated.
+
+
+
+
+
+
+
+
+
+
+Method Summary
+
+Methods
+
+Modifier and Type
+Method and Description
+
+
+int
+addInPlace(intt1,
+  intt2)
+Deprecated.
+
+
+
+int
+zero(intinitialValue)
+Deprecated.
+
+
+
+
+
+
+
+Methods inherited from classObject
+equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, 
wait
+
+
+
+
+
+Methods inherited from interfaceorg.apache.spark.AccumulatorParam
+addAccumulator
+
+
+
+
+
+Methods inherited from interfaceorg.apache.spark.AccumulableParam
+addInPlace,
 zero
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Field Detail
+
+
+
+
+
+MODULE$
+public static finalAccumulatorParam.IntAccumulatorParam$ 
MODULE$
+Deprecated.
+Static reference to the singleton instance of this Scala 
object.
+
+
+
+
+
+
+
+
+
+Constructor Detail
+
+
+
+
+
+AccumulatorParam.IntAccumulatorParam$
+publicAccumulatorParam.IntAccumulatorParam$()
+Deprecated.
+
+
+
+
+
+
+
+
+
+Method Detail
+
+
+
+
+
+addInPlace
+publicintaddInPlace(intt1,
+ intt2)
+Deprecated.
+
+
+
+
+
+
+
+zero
+publicintzero(intinitialValue)
+Deprecated.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev 
Class
+Next 
Class
+
+
+Frames
+No 
Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/AccumulatorParam.LongAccumulatorParam$.html
--
diff --git 
a/site/docs/2.0.2/api/java/org/apache/spark/AccumulatorParam.LongAccumulatorParam$.html
 
b/site/docs/2.0.2/api/java/org/apache/spark/AccumulatorParam.LongAccumulatorParam$.html
new file mode 100644
index 000..d39926f
--- /dev/null
+++ 
b/site/docs/2.0.2/api/java/org/apache/spark/AccumulatorParam.LongAccumulatorParam$.html
@@ -0,0 +1,361 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+AccumulatorParam.LongAccumulatorParam$ (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+
+org.apache.spark
+Class ExecutorRemoved
+
+
+
+Object
+
+
+org.apache.spark.ExecutorRemoved
+
+
+
+
+
+
+
+All Implemented Interfaces:
+java.io.Serializable, scala.Equals, scala.Product
+
+
+
+public class ExecutorRemoved
+extends Object
+implements scala.Product, scala.Serializable
+See Also:Serialized
 Form
+
+
+
+
+
+
+
+
+
+
+
+Constructor Summary
+
+Constructors
+
+Constructor and Description
+
+
+ExecutorRemoved(StringexecutorId)
+
+
+
+
+
+
+
+
+
+Method Summary
+
+Methods
+
+Modifier and Type
+Method and Description
+
+
+abstract static boolean
+canEqual(Objectthat)
+
+
+abstract static boolean
+equals(Objectthat)
+
+
+String
+executorId()
+
+
+abstract static int
+productArity()
+
+
+abstract static Object
+productElement(intn)
+
+
+static 
scala.collection.IteratorObject
+productIterator()
+
+
+static String
+productPrefix()
+
+
+
+
+
+
+Methods inherited from classObject
+equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, 
wait
+
+
+
+
+
+Methods inherited from interfacescala.Product
+productArity, productElement, productIterator, productPrefix
+
+
+
+
+
+Methods inherited from interfacescala.Equals
+canEqual, equals
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Constructor Detail
+
+
+
+
+
+ExecutorRemoved
+publicExecutorRemoved(StringexecutorId)
+
+
+
+
+
+
+
+
+
+Method Detail
+
+
+
+
+
+canEqual
+public abstract staticbooleancanEqual(Objectthat)
+
+
+
+
+
+
+
+equals
+public abstract staticbooleanequals(Objectthat)
+
+
+
+
+
+
+
+productElement
+public abstract staticObjectproductElement(intn)
+
+
+
+
+
+
+
+productArity
+public abstract staticintproductArity()
+
+
+
+
+
+
+
+productIterator
+public 
staticscala.collection.IteratorObjectproductIterator()
+
+
+
+
+
+
+
+productPrefix
+public staticStringproductPrefix()
+
+
+
+
+
+
+
+executorId
+publicStringexecutorId()
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/ExpireDeadHosts.html
--
diff --git a/site/docs/2.0.2/api/java/org/apache/spark/ExpireDeadHosts.html 
b/site/docs/2.0.2/api/java/org/apache/spark/ExpireDeadHosts.html
new file mode 100644
index 000..492fd65
--- /dev/null
+++ b/site/docs/2.0.2/api/java/org/apache/spark/ExpireDeadHosts.html
@@ -0,0 +1,319 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+ExpireDeadHosts (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+
+org.apache.spark
+Class ExpireDeadHosts
+
+
+
+Object
+
+
+org.apache.spark.ExpireDeadHosts
+
+
+
+
+
+
+
+
+public class ExpireDeadHosts
+extends Object
+
+
+
+
+
+
+
+
+
+
+
+Constructor

[01/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

Repository: spark-website
Updated Branches:
  refs/heads/asf-site b9aa4c3ee -> 0bd363165


http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/UnknownReason.html
--
diff --git a/site/docs/2.0.2/api/java/org/apache/spark/UnknownReason.html 
b/site/docs/2.0.2/api/java/org/apache/spark/UnknownReason.html
new file mode 100644
index 000..5197a6b
--- /dev/null
+++ b/site/docs/2.0.2/api/java/org/apache/spark/UnknownReason.html
@@ -0,0 +1,348 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+UnknownReason (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+
+org.apache.spark
+Class UnknownReason
+
+
+
+Object
+
+
+org.apache.spark.UnknownReason
+
+
+
+
+
+
+
+
+public class UnknownReason
+extends Object
+:: DeveloperApi ::
+ We don't know why the task ended -- for example, because of a ClassNotFound 
exception when
+ deserializing the task result.
+
+
+
+
+
+
+
+
+
+
+
+Constructor Summary
+
+Constructors
+
+Constructor and Description
+
+
+UnknownReason()
+
+
+
+
+
+
+
+
+
+Method Summary
+
+Methods
+
+Modifier and Type
+Method and Description
+
+
+abstract static boolean
+canEqual(Objectthat)
+
+
+static boolean
+countTowardsTaskFailures()
+
+
+abstract static boolean
+equals(Objectthat)
+
+
+abstract static int
+productArity()
+
+
+abstract static Object
+productElement(intn)
+
+
+static 
scala.collection.IteratorObject
+productIterator()
+
+
+static String
+productPrefix()
+
+
+static String
+toErrorString()
+
+
+
+
+
+
+Methods inherited from classObject
+equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, 
wait
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Constructor Detail
+
+
+
+
+
+UnknownReason
+publicUnknownReason()
+
+
+
+
+
+
+
+
+
+Method Detail
+
+
+
+
+
+toErrorString
+public staticStringtoErrorString()
+
+
+
+
+
+
+
+countTowardsTaskFailures
+public staticbooleancountTowardsTaskFailures()
+
+
+
+
+
+
+
+canEqual
+public abstract staticbooleancanEqual(Objectthat)
+
+
+
+
+
+
+
+equals
+public abstract staticbooleanequals(Objectthat)
+
+
+
+
+
+
+
+productElement
+public abstract staticObjectproductElement(intn)
+
+
+
+
+
+
+
+productArity
+public abstract staticintproductArity()
+
+
+
+
+
+
+
+productIterator
+public 
staticscala.collection.IteratorObjectproductIterator()
+
+
+
+
+
+
+
+productPrefix
+public staticStringproductPrefix()
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[02/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/TaskKilled.html
--
diff --git a/site/docs/2.0.2/api/java/org/apache/spark/TaskKilled.html 
b/site/docs/2.0.2/api/java/org/apache/spark/TaskKilled.html
new file mode 100644
index 000..976143b
--- /dev/null
+++ b/site/docs/2.0.2/api/java/org/apache/spark/TaskKilled.html
@@ -0,0 +1,347 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+TaskKilled (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+
+org.apache.spark
+Class TaskKilled
+
+
+
+Object
+
+
+org.apache.spark.TaskKilled
+
+
+
+
+
+
+
+
+public class TaskKilled
+extends Object
+:: DeveloperApi ::
+ Task was killed intentionally and needs to be rescheduled.
+
+
+
+
+
+
+
+
+
+
+
+Constructor Summary
+
+Constructors
+
+Constructor and Description
+
+
+TaskKilled()
+
+
+
+
+
+
+
+
+
+Method Summary
+
+Methods
+
+Modifier and Type
+Method and Description
+
+
+abstract static boolean
+canEqual(Objectthat)
+
+
+static boolean
+countTowardsTaskFailures()
+
+
+abstract static boolean
+equals(Objectthat)
+
+
+abstract static int
+productArity()
+
+
+abstract static Object
+productElement(intn)
+
+
+static 
scala.collection.IteratorObject
+productIterator()
+
+
+static String
+productPrefix()
+
+
+static String
+toErrorString()
+
+
+
+
+
+
+Methods inherited from classObject
+equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, 
wait
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Constructor Detail
+
+
+
+
+
+TaskKilled
+publicTaskKilled()
+
+
+
+
+
+
+
+
+
+Method Detail
+
+
+
+
+
+toErrorString
+public staticStringtoErrorString()
+
+
+
+
+
+
+
+countTowardsTaskFailures
+public staticbooleancountTowardsTaskFailures()
+
+
+
+
+
+
+
+canEqual
+public abstract staticbooleancanEqual(Objectthat)
+
+
+
+
+
+
+
+equals
+public abstract staticbooleanequals(Objectthat)
+
+
+
+
+
+
+
+productElement
+public abstract staticObjectproductElement(intn)
+
+
+
+
+
+
+
+productArity
+public abstract staticintproductArity()
+
+
+
+
+
+
+
+productIterator
+public 
staticscala.collection.IteratorObjectproductIterator()
+
+
+
+
+
+
+
+productPrefix
+public staticStringproductPrefix()
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/TaskKilledException.html
--
diff --git a/site/docs/2.0.2/api/java/org/apache/spark/TaskKilledException.html 
b/site/docs/2.0.2/api/java/org/apache/spark/TaskKilledException.html
new file mode 100644
index 000..6f9a0b7
--- /dev/null
+++ b/site/docs/2.0.2/api/java/org/apache/spark/TaskKilledException.html
@@ -0,0 +1,255 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+TaskKilledException (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+
+org.apache.spark
+Class 
TaskKilledException
+
+
+
+Object
+
+
+Throwable
+
+
+Exception
+
+
+RuntimeException
+
+
+org.apache.spark.TaskKilledException
+
+
+
+
+
+
+
+
+
+
+
+
+
+All Implemented Interfaces:
+java.io.Serializable
+
+
+
+public class TaskKilledException
+extends RuntimeException
+:: DeveloperApi ::
+ Exception thrown when a task is explicitly killed

[04/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/SparkJobInfo.html
--
diff --git a/site/docs/2.0.2/api/java/org/apache/spark/SparkJobInfo.html 
b/site/docs/2.0.2/api/java/org/apache/spark/SparkJobInfo.html
new file mode 100644
index 000..bc6e486
--- /dev/null
+++ b/site/docs/2.0.2/api/java/org/apache/spark/SparkJobInfo.html
@@ -0,0 +1,243 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+SparkJobInfo (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev 
Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+
+org.apache.spark
+Interface SparkJobInfo
+
+
+
+
+
+
+All Superinterfaces:
+java.io.Serializable
+
+
+All Known Implementing Classes:
+SparkJobInfoImpl
+
+
+
+public interface SparkJobInfo
+extends java.io.Serializable
+Exposes information about Spark Jobs.
+
+ This interface is not designed to be implemented outside of Spark.  We may 
add additional methods
+ which may break binary compatibility with outside implementations.
+
+
+
+
+
+
+
+
+
+
+
+Method Summary
+
+Methods
+
+Modifier and Type
+Method and Description
+
+
+int
+jobId()
+
+
+int[]
+stageIds()
+
+
+JobExecutionStatus
+status()
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Method Detail
+
+
+
+
+
+jobId
+intjobId()
+
+
+
+
+
+
+
+stageIds
+int[]stageIds()
+
+
+
+
+
+
+
+status
+JobExecutionStatusstatus()
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev 
Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/SparkJobInfoImpl.html
--
diff --git a/site/docs/2.0.2/api/java/org/apache/spark/SparkJobInfoImpl.html 
b/site/docs/2.0.2/api/java/org/apache/spark/SparkJobInfoImpl.html
new file mode 100644
index 000..d95f90b
--- /dev/null
+++ b/site/docs/2.0.2/api/java/org/apache/spark/SparkJobInfoImpl.html
@@ -0,0 +1,302 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+SparkJobInfoImpl (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+
+org.apache.spark
+Class SparkJobInfoImpl
+
+
+
+Object
+
+
+org.apache.spark.SparkJobInfoImpl
+
+
+
+
+
+
+
+All Implemented Interfaces:
+java.io.Serializable, SparkJobInfo
+
+
+
+public class SparkJobInfoImpl
+extends Object
+implements SparkJobInfo
+See Also:Serialized
 Form
+
+
+
+
+
+
+
+
+
+
+
+Constructor Summary
+
+Constructors
+
+Constructor and Description
+
+
+SparkJobInfoImpl(intjobId,
+int[]stageIds,
+JobExecutionStatusstatus)
+
+
+
+
+
+
+
+
+
+Method Summary
+
+Methods
+
+Modifier and Type
+Method and Description
+
+
+int
+jobId()
+
+
+int[]
+stageIds()
+
+
+JobExecutionStatus
+status()
+
+
+
+
+
+
+Methods inherited from classObject
+equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, 
wait
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Constructor Detail
+
+
+
+
+
+SparkJobInfoImpl
+publicSparkJobInfoImpl(intjobId,
+int[]stageIds,
+JobExecutionStatusstatus)
+
+
+
+
+
+
+
+
+
+Method Detail
+
+
+
+
+
+jobId
+publicintjobId()
+
+Specified by:
+jobIdin
 interfaceSparkJobInfo
+
+
+
+
+
+
+
+
+stageIds
+publicint[]stageIds()
+
+Specified by:
+stageIdsin
 interfaceSparkJobInfo
+
+
+
+
+
+
+
+
+status
+publicJobExecutionStatusstatus()
+
+Specified by:
+statusin

[24/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/window.html
--
diff --git a/site/docs/2.0.2/api/R/window.html 
b/site/docs/2.0.2/api/R/window.html
new file mode 100644
index 000..01536c1
--- /dev/null
+++ b/site/docs/2.0.2/api/R/window.html
@@ -0,0 +1,163 @@
+
+R: window
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+window 
{SparkR}R Documentation
+
+window
+
+Description
+
+Bucketize rows into one or more time windows given a timestamp specifying 
column. Window
+starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in 
the window
+[12:05,12:10) but not in [12:00,12:05). Windows can support microsecond 
precision. Windows in
+the order of months are not supported.
+
+
+
+Usage
+
+
+## S4 method for signature 'Column'
+window(x, windowDuration, slideDuration = NULL,
+  startTime = NULL)
+
+window(x, ...)
+
+
+
+Arguments
+
+
+x
+
+a time Column. Must be of TimestampType.
+
+windowDuration
+
+a string specifying the width of the window, e.g. '1 second',
+'1 day 12 hours', '2 minutes'. Valid interval strings are 'week',
+'day', 'hour', 'minute', 'second', 'millisecond', 'microsecond'. Note that
+the duration is a fixed length of time, and does not vary over time
+according to a calendar. For example, '1 day' always means 86,400,000
+milliseconds, not a calendar day.
+
+slideDuration
+
+a string specifying the sliding interval of the window. Same format as
+windowDuration. A new window will be generated every
+slideDuration. Must be less than or equal to
+the windowDuration. This duration is likewise absolute, and does 
not
+vary according to a calendar.
+
+startTime
+
+the offset with respect to 1970-01-01 00:00:00 UTC with which to start
+window intervals. For example, in order to have hourly tumbling windows
+that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide
+startTime as "15 minutes".
+
+...
+
+further arguments to be passed to or from other methods.
+
+
+
+
+Value
+
+An output column of struct called 'window' by default with the nested 
columns 'start'
+and 'end'.
+
+
+
+Note
+
+window since 2.0.0
+
+
+
+See Also
+
+Other datetime_funcs: add_months,
+add_months,
+add_months,Column,numeric-method;
+date_add, date_add,
+date_add,Column,numeric-method;
+date_format, date_format,
+date_format,Column,character-method;
+date_sub, date_sub,
+date_sub,Column,numeric-method;
+datediff, datediff,
+datediff,Column-method;
+dayofmonth, dayofmonth,
+dayofmonth,Column-method;
+dayofyear, dayofyear,
+dayofyear,Column-method;
+from_unixtime, from_unixtime,
+from_unixtime,Column-method;
+from_utc_timestamp,
+from_utc_timestamp,
+from_utc_timestamp,Column,character-method;
+hour, hour,
+hour,Column-method; last_day,
+last_day,
+last_day,Column-method;
+minute, minute,
+minute,Column-method;
+months_between,
+months_between,
+months_between,Column-method;
+month, month,
+month,Column-method;
+next_day, next_day,
+next_day,Column,character-method;
+quarter, quarter,
+quarter,Column-method;
+second, second,
+second,Column-method;
+to_date, to_date,
+to_date,Column-method;
+to_utc_timestamp,
+to_utc_timestamp,
+to_utc_timestamp,Column,character-method;
+unix_timestamp,
+unix_timestamp,
+unix_timestamp,
+unix_timestamp,
+unix_timestamp,Column,character-method,
+unix_timestamp,Column,missing-method,
+unix_timestamp,missing,missing-method;
+weekofyear, weekofyear,
+weekofyear,Column-method;
+year, year,
+year,Column-method
+
+
+
+Examples
+
+## Not run: 
+##D   # One minute windows every 15 seconds 10 seconds after the minute, e.g. 
09:00:10-09:01:10,
+##D   # 09:00:25-09:01:25, 09:00:40-09:01:40, ...
+##D   window(df$time, 1 minute, 15 seconds, 10 
seconds)
+##D 
+##D   # One minute tumbling windows 15 seconds after the minute, e.g. 
09:00:15-09:01:15,
+##D# 09:01:15-09:02:15...
+##D   window(df$time, 1 minute, startTime = 15 seconds)
+##D 
+##D   # Thirty-second windows every 10 seconds, e.g. 09:00:00-09:00:30, 
09:00:10-09:00:40, ...
+##D   window(df$time, 30 seconds, 10 seconds)
+## End(Not run)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/windowOrderBy.html
--
diff --git a/site/docs/2.0.2/api/R/windowOrderBy.html 
b/site/docs/2.0.2/api/R/windowOrderBy.html
new file mode 100644
index 000..b0cb39e
--- /dev/null
+++ b/site/docs/2.0.2/api/R/windowOrderBy.html
@@ -0,0 +1,72 @@
+
+R: windowOrderBy
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>

[38/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/install.spark.html
--
diff --git a/site/docs/2.0.2/api/R/install.spark.html 
b/site/docs/2.0.2/api/R/install.spark.html
new file mode 100644
index 000..5657727
--- /dev/null
+++ b/site/docs/2.0.2/api/R/install.spark.html
@@ -0,0 +1,119 @@
+
+R: Download and Install Apache Spark to a Local 
Directory
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+install.spark {SparkR}R 
Documentation
+
+Download and Install Apache Spark to a Local Directory
+
+Description
+
+install.spark downloads and installs Spark to a local 
directory if
+it is not found. The Spark version we use is the same as the SparkR version.
+Users can specify a desired Hadoop version, the remote mirror site, and
+the directory where the package is installed locally.
+
+
+
+Usage
+
+
+install.spark(hadoopVersion = "2.7", mirrorUrl = NULL, localDir = NULL,
+  overwrite = FALSE)
+
+
+
+Arguments
+
+
+hadoopVersion
+
+Version of Hadoop to install. Default is "2.7". It can take 
other
+version number in the format of x.y where x and y are integer.
+If hadoopVersion = "without", Hadoop free build is 
installed.
+See
+http://spark.apache.org/docs/latest/hadoop-provided.html;>
+Hadoop Free Build for more information.
+Other patched version names can also be used, e.g. "cdh4"
+
+mirrorUrl
+
+base URL of the repositories to use. The directory layout should follow
+http://www.apache.org/dyn/closer.lua/spark/;>Apache mirrors.
+
+localDir
+
+a local directory where Spark is installed. The directory contains
+version-specific folders of Spark packages. Default is path to
+the cache directory:
+
+
+
+ Mac OS X: ~/Library/Caches/spark
+
+
+ Unix: $XDG_CACHE_HOME if defined, otherwise 
~/.cache/spark
+
+
+ Windows: %LOCALAPPDATA%\spark\spark\Cache.
+
+
+
+overwrite
+
+If TRUE, download and overwrite the existing tar file in 
localDir
+and force re-install Spark (in case the local directory or file is 
corrupted)
+
+
+
+
+Details
+
+The full url of remote file is inferred from mirrorUrl and 
hadoopVersion.
+mirrorUrl specifies the remote path to a Spark folder. It is 
followed by a subfolder
+named after the Spark version (that corresponds to SparkR), and then the tar 
filename.
+The filename is composed of four parts, i.e. [Spark version]-bin-[Hadoop 
version].tgz.
+For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 from
+http://apache.osuosl.org has path:
+http://apache.osuosl.org/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz.
+For hadoopVersion = "without", [Hadoop version] in the filename 
is then
+without-hadoop.
+
+
+
+Value
+
+install.spark returns the local directory where Spark is found 
or installed
+
+
+
+Note
+
+install.spark since 2.1.0
+
+
+
+See Also
+
+See available Hadoop versions:
+http://spark.apache.org/downloads.html;>Apache Spark
+
+
+
+Examples
+
+## Not run: 
+##D install.spark()
+## End(Not run)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/instr.html
--
diff --git a/site/docs/2.0.2/api/R/instr.html b/site/docs/2.0.2/api/R/instr.html
new file mode 100644
index 000..c0483ad
--- /dev/null
+++ b/site/docs/2.0.2/api/R/instr.html
@@ -0,0 +1,126 @@
+
+R: instr
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+instr 
{SparkR}R Documentation
+
+instr
+
+Description
+
+Locate the position of the first occurrence of substr column in the given 
string.
+Returns null if either of the arguments are null.
+
+
+
+Usage
+
+
+## S4 method for signature 'Column,character'
+instr(y, x)
+
+instr(y, x)
+
+
+
+Arguments
+
+
+y
+
+column to check
+
+x
+
+substring to check
+
+
+
+
+Details
+
+NOTE: The position is not zero based, but 1 based index, returns 0 if substr
+could not be found in str.
+
+
+
+Note
+
+instr since 1.5.0
+
+
+
+See Also
+
+Other string_funcs: ascii,
+ascii, ascii,Column-method;
+base64, base64,
+base64,Column-method;
+concat_ws, concat_ws,
+concat_ws,character,Column-method;
+concat, concat,
+concat,Column-method; decode,
+decode,
+decode,Column,character-method;
+encode, encode,
+encode,Column,character-method;
+format_number, format_number,
+format_number,Column,numeric-method;
+format_string, format_string,
+format_string,character,Column-method;
+initcap, initcap,
+initcap,Column-method;
+length,

[09/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/JobExecutionStatus.html
--
diff --git a/site/docs/2.0.2/api/java/org/apache/spark/JobExecutionStatus.html 
b/site/docs/2.0.2/api/java/org/apache/spark/JobExecutionStatus.html
new file mode 100644
index 000..cb8b512
--- /dev/null
+++ b/site/docs/2.0.2/api/java/org/apache/spark/JobExecutionStatus.html
@@ -0,0 +1,354 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+JobExecutionStatus (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev 
Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Enum Constants|
+Field|
+Method
+
+
+Detail:
+Enum Constants|
+Field|
+Method
+
+
+
+
+
+
+
+
+org.apache.spark
+Enum JobExecutionStatus
+
+
+
+Object
+
+
+EnumJobExecutionStatus
+
+
+org.apache.spark.JobExecutionStatus
+
+
+
+
+
+
+
+
+
+All Implemented Interfaces:
+java.io.Serializable, ComparableJobExecutionStatus
+
+
+
+public enum JobExecutionStatus
+extends EnumJobExecutionStatus
+
+
+
+
+
+
+
+
+
+
+
+Enum Constant Summary
+
+Enum Constants
+
+Enum Constant and Description
+
+
+FAILED
+
+
+RUNNING
+
+
+SUCCEEDED
+
+
+UNKNOWN
+
+
+
+
+
+
+
+
+
+Method Summary
+
+Methods
+
+Modifier and Type
+Method and Description
+
+
+static JobExecutionStatus
+fromString(Stringstr)
+
+
+static JobExecutionStatus
+valueOf(Stringname)
+Returns the enum constant of this type with the specified 
name.
+
+
+
+static JobExecutionStatus[]
+values()
+Returns an array containing the constants of this enum 
type, in
+the order they are declared.
+
+
+
+
+
+
+
+Methods inherited from classEnum
+compareTo, equals, getDeclaringClass, hashCode, name, ordinal, toString, 
valueOf
+
+
+
+
+
+Methods inherited from classObject
+getClass, notify, notifyAll, wait, wait, wait
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Enum Constant Detail
+
+
+
+
+
+RUNNING
+public static finalJobExecutionStatus RUNNING
+
+
+
+
+
+
+
+SUCCEEDED
+public static finalJobExecutionStatus SUCCEEDED
+
+
+
+
+
+
+
+FAILED
+public static finalJobExecutionStatus FAILED
+
+
+
+
+
+
+
+UNKNOWN
+public static finalJobExecutionStatus UNKNOWN
+
+
+
+
+
+
+
+
+
+Method Detail
+
+
+
+
+
+values
+public staticJobExecutionStatus[]values()
+Returns an array containing the constants of this enum 
type, in
+the order they are declared.  This method may be used to iterate
+over the constants as follows:
+
+for (JobExecutionStatus c : JobExecutionStatus.values())
+   System.out.println(c);
+
+Returns:an array containing the 
constants of this enum type, in the order they are declared
+
+
+
+
+
+
+
+valueOf
+public staticJobExecutionStatusvalueOf(Stringname)
+Returns the enum constant of this type with the specified 
name.
+The string must match exactly an identifier used to declare an
+enum constant in this type.  (Extraneous whitespace characters are 
+not permitted.)
+Parameters:name - 
the name of the enum constant to be returned.
+Returns:the enum constant with the 
specified name
+Throws:
+IllegalArgumentException - if this enum type has no constant 
with the specified name
+NullPointerException - if the argument is null
+
+
+
+
+
+
+
+fromString
+public staticJobExecutionStatusfromString(Stringstr)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev 
Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Enum Constants|
+Field|
+Method
+
+
+Detail:
+Enum Constants|
+Field|
+Method
+
+
+
+
+
+
+
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/JobSubmitter.html
--
diff --git a/site/docs/2.0.2/api/java/org/apache/spark/JobSubmitter.html 
b/site/docs/2.0.2/api/java/org/apache/spark/JobSubmitter.html
new file mode 100644
index 000..5681e58
--- /dev/null
+++ b/site/docs/2.0.2/api/java/org/apache/spark/JobSubmitter.html
@@ -0,0 +1,221 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+JobSubmitter (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript

[19/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/constant-values.html
--
diff --git a/site/docs/2.0.2/api/java/constant-values.html 
b/site/docs/2.0.2/api/java/constant-values.html
new file mode 100644
index 000..ee81714
--- /dev/null
+++ b/site/docs/2.0.2/api/java/constant-values.html
@@ -0,0 +1,233 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+Constant Field Values (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev
+Next
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+
+
+
+Constant Field Values
+Contents
+
+org.apache.*
+
+
+
+
+
+org.apache.*
+
+
+
+org.apache.spark.launcher.SparkLauncher
+
+Modifier and Type
+Constant Field
+Value
+
+
+
+
+
+publicstaticfinalString
+CHILD_CONNECTION_TIMEOUT
+"spark.launcher.childConectionTimeout"
+
+
+
+
+publicstaticfinalString
+CHILD_PROCESS_LOGGER_NAME
+"spark.launcher.childProcLoggerName"
+
+
+
+
+publicstaticfinalString
+DEPLOY_MODE
+"spark.submit.deployMode"
+
+
+
+
+publicstaticfinalString
+DRIVER_EXTRA_CLASSPATH
+"spark.driver.extraClassPath"
+
+
+
+
+publicstaticfinalString
+DRIVER_EXTRA_JAVA_OPTIONS
+"spark.driver.extraJavaOptions"
+
+
+
+
+publicstaticfinalString
+DRIVER_EXTRA_LIBRARY_PATH
+"spark.driver.extraLibraryPath"
+
+
+
+
+publicstaticfinalString
+DRIVER_MEMORY
+"spark.driver.memory"
+
+
+
+
+publicstaticfinalString
+EXECUTOR_CORES
+"spark.executor.cores"
+
+
+
+
+publicstaticfinalString
+EXECUTOR_EXTRA_CLASSPATH
+"spark.executor.extraClassPath"
+
+
+
+
+publicstaticfinalString
+EXECUTOR_EXTRA_JAVA_OPTIONS
+"spark.executor.extraJavaOptions"
+
+
+
+
+publicstaticfinalString
+EXECUTOR_EXTRA_LIBRARY_PATH
+"spark.executor.extraLibraryPath"
+
+
+
+
+publicstaticfinalString
+EXECUTOR_MEMORY
+"spark.executor.memory"
+
+
+
+
+publicstaticfinalString
+NO_RESOURCE
+"spark-internal"
+
+
+
+
+publicstaticfinalString
+SPARK_MASTER
+"spark.master"
+
+
+
+
+
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev
+Next
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+
+
+
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/deprecated-list.html
--
diff --git a/site/docs/2.0.2/api/java/deprecated-list.html 
b/site/docs/2.0.2/api/java/deprecated-list.html
new file mode 100644
index 000..3f4dacc
--- /dev/null
+++ b/site/docs/2.0.2/api/java/deprecated-list.html
@@ -0,0 +1,577 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+Deprecated List (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev
+Next
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+
+
+
+Deprecated API
+Contents
+
+Deprecated Interfaces
+Deprecated Classes
+Deprecated Methods
+Deprecated Constructors
+
+
+
+
+
+
+
+
+Deprecated Interfaces
+
+Interface and Description
+
+
+
+org.apache.spark.AccumulableParam
+use AccumulatorV2. Since 2.0.0.
+
+
+
+org.apache.spark.AccumulatorParam
+use AccumulatorV2. Since 2.0.0.
+
+
+
+
+
+
+
+
+
+
+
+
+Deprecated Classes
+
+Class and Description
+
+
+
+org.apache.spark.Accumulable
+use AccumulatorV2. Since 2.0.0.
+
+
+
+org.apache.spark.Accumulator
+use AccumulatorV2. Since 2.0.0.
+
+
+
+org.apache.spark.AccumulatorParam.DoubleAccumulatorParam$
+use AccumulatorV2. Since 2.0.0.
+
+
+
+org.apache.spark.AccumulatorParam.FloatAccumulatorParam$
+use AccumulatorV2. Since 2.0.0.
+
+
+
+org.apache.spark.AccumulatorParam.IntAccumulatorParam$
+use AccumulatorV2. Since 2.0.0.
+
+
+
+org.apache.spark.AccumulatorParam.LongAccumulatorParam$
+use AccumulatorV2. Since 2.0.0.
+
+
+
+org.apache.spark.AccumulatorParam.StringAccumulatorParam$
+use AccumulatorV2. Since 2.0.0.
+
+
+
+org.apache.spark.sql.hive.HiveContext
+Use

[12/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/ComplexFutureAction.html
--
diff --git a/site/docs/2.0.2/api/java/org/apache/spark/ComplexFutureAction.html 
b/site/docs/2.0.2/api/java/org/apache/spark/ComplexFutureAction.html
new file mode 100644
index 000..598387d
--- /dev/null
+++ b/site/docs/2.0.2/api/java/org/apache/spark/ComplexFutureAction.html
@@ -0,0 +1,485 @@
+http://www.w3.org/TR/html4/loose.dtd;>
+
+
+
+
+ComplexFutureAction (Spark 2.0.2 JavaDoc)
+
+
+
+
+
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev 
Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary:
+Nested|
+Field|
+Constr|
+Method
+
+
+Detail:
+Field|
+Constr|
+Method
+
+
+
+
+
+
+
+
+org.apache.spark
+Class 
ComplexFutureActionT
+
+
+
+Object
+
+
+org.apache.spark.ComplexFutureActionT
+
+
+
+
+
+
+
+All Implemented Interfaces:
+FutureActionT, 
scala.concurrent.AwaitableT, scala.concurrent.FutureT
+
+
+
+public class ComplexFutureActionT
+extends Object
+implements FutureActionT
+A FutureAction for actions 
that could trigger multiple Spark jobs. Examples include take,
+ takeSample. Cancellation works by setting the cancelled flag to true and 
cancelling any pending
+ jobs.
+
+
+
+
+
+
+
+
+
+
+
+Nested Class Summary
+
+
+
+
+Nested classes/interfaces inherited from 
interfacescala.concurrent.Future
+scala.concurrent.Future.InternalCallbackExecutor$
+
+
+
+
+
+
+
+
+Constructor Summary
+
+Constructors
+
+Constructor and Description
+
+
+ComplexFutureAction(scala.Function1JobSubmitter,scala.concurrent.FutureTrun)
+
+
+
+
+
+
+
+
+
+Method Summary
+
+Methods
+
+Modifier and Type
+Method and Description
+
+
+void
+cancel()
+Cancels the execution of this action.
+
+
+
+boolean
+isCancelled()
+Returns whether the action has been cancelled.
+
+
+
+boolean
+isCompleted()
+Returns whether the action has already been completed with 
a value or an exception.
+
+
+
+scala.collection.SeqObject
+jobIds()
+Returns the job IDs run by the underlying async 
operation.
+
+
+
+Uvoid
+onComplete(scala.Function1scala.util.TryT,Ufunc,
+  scala.concurrent.ExecutionContextexecutor)
+When this action is completed, either through an exception, 
or a value, applies the provided
+ function.
+
+
+
+ComplexFutureActionT
+ready(scala.concurrent.duration.DurationatMost,
+ scala.concurrent.CanAwaitpermit)
+Blocks until this action completes.
+
+
+
+T
+result(scala.concurrent.duration.DurationatMost,
+  scala.concurrent.CanAwaitpermit)
+Awaits and returns the result (of type T) of this 
action.
+
+
+
+scala.Optionscala.util.TryT
+value()
+The value of this Future.
+
+
+
+
+
+
+
+Methods inherited from classObject
+equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, 
wait
+
+
+
+
+
+Methods inherited from interfaceorg.apache.spark.FutureAction
+get
+
+
+
+
+
+Methods inherited from interfacescala.concurrent.Future
+andThen, collect, failed, fallbackTo, filter, flatMap, foreach, map, 
mapTo, onFailure, onSuccess, recover, recoverWith, transform, withFilter, 
zip
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Constructor Detail
+
+
+
+
+
+ComplexFutureAction
+publicComplexFutureAction(scala.Function1JobSubmitter,scala.concurrent.FutureTrun)
+
+
+
+
+
+
+
+
+
+Method Detail
+
+
+
+
+
+cancel
+publicvoidcancel()
+Description copied from interface:FutureAction
+Cancels the execution of this action.
+
+Specified by:
+cancelin
 interfaceFutureActionT
+
+
+
+
+
+
+
+
+isCancelled
+publicbooleanisCancelled()
+Description copied from interface:FutureAction
+Returns whether the action has been cancelled.
+
+Specified by:
+isCancelledin
 interfaceFutureActionT
+Returns:(undocumented)
+
+
+
+
+
+
+
+ready
+publicComplexFutureActionTready(scala.concurrent.duration.DurationatMost,
+   scala.concurrent.CanAwaitpermit)
+ throws InterruptedException,
+java.util.concurrent.TimeoutException
+Description copied from interface:FutureAction
+Blocks until this action completes.
+ 
+
+Specified by:
+readyin
 interfaceFutureActionT
+Specified by:
+readyin 
interfacescala.concurrent.AwaitableT
+Parameters:atMost - 
maximum wait time, which may be negative (no waiting is done), Duration.Inf
+   for unbounded waiting, or a finite positive 
durationpermit - (undocumented)
+Returns:this FutureAction
+Throws:
+InterruptedException
+java.util.concurrent.TimeoutException
+
+
+
+
+
+
+

[30/51] [partial] spark-website git commit: Add docs for 2.0.2.

2016-11-11 Thread rxin

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/sd.html
--
diff --git a/site/docs/2.0.2/api/R/sd.html b/site/docs/2.0.2/api/R/sd.html
new file mode 100644
index 000..38f67ec
--- /dev/null
+++ b/site/docs/2.0.2/api/R/sd.html
@@ -0,0 +1,121 @@
+
+R: sd
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+sd {SparkR}R Documentation
+
+sd
+
+Description
+
+Aggregate function: alias for stddev_samp
+
+
+
+Usage
+
+
+## S4 method for signature 'Column'
+sd(x)
+
+## S4 method for signature 'Column'
+stddev(x)
+
+sd(x, na.rm = FALSE)
+
+stddev(x)
+
+
+
+Arguments
+
+
+x
+
+Column to compute on.
+
+na.rm
+
+currently not used.
+
+
+
+
+Note
+
+sd since 1.6.0
+
+stddev since 1.6.0
+
+
+
+See Also
+
+stddev_pop, stddev_samp
+
+Other agg_funcs: agg, agg,
+agg, agg,GroupedData-method,
+agg,SparkDataFrame-method,
+summarize, summarize,
+summarize,
+summarize,GroupedData-method,
+summarize,SparkDataFrame-method;
+avg, avg,
+avg,Column-method;
+countDistinct, countDistinct,
+countDistinct,Column-method,
+n_distinct, n_distinct,
+n_distinct,Column-method;
+count, count,
+count,Column-method,
+count,GroupedData-method, n,
+n, n,Column-method;
+first, first,
+first,
+first,SparkDataFrame-method,
+first,characterOrColumn-method;
+kurtosis, kurtosis,
+kurtosis,Column-method; last,
+last,
+last,characterOrColumn-method;
+max, max,Column-method;
+mean, mean,Column-method;
+min, min,Column-method;
+skewness, skewness,
+skewness,Column-method;
+stddev_pop, stddev_pop,
+stddev_pop,Column-method;
+stddev_samp, stddev_samp,
+stddev_samp,Column-method;
+sumDistinct, sumDistinct,
+sumDistinct,Column-method;
+sum, sum,Column-method;
+var_pop, var_pop,
+var_pop,Column-method;
+var_samp, var_samp,
+var_samp,Column-method; var,
+var, var,Column-method,
+variance, variance,
+variance,Column-method
+
+
+
+Examples
+
+## Not run: 
+##D stddev(df$c)
+##D select(df, stddev(df$age))
+##D agg(df, sd(df$age))
+## End(Not run)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/second.html
--
diff --git a/site/docs/2.0.2/api/R/second.html 
b/site/docs/2.0.2/api/R/second.html
new file mode 100644
index 000..92dc854
--- /dev/null
+++ b/site/docs/2.0.2/api/R/second.html
@@ -0,0 +1,112 @@
+
+R: second
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+second 
{SparkR}R Documentation
+
+second
+
+Description
+
+Extracts the seconds as an integer from a given date/timestamp/string.
+
+
+
+Usage
+
+
+## S4 method for signature 'Column'
+second(x)
+
+second(x)
+
+
+
+Arguments
+
+
+x
+
+Column to compute on.
+
+
+
+
+Note
+
+second since 1.5.0
+
+
+
+See Also
+
+Other datetime_funcs: add_months,
+add_months,
+add_months,Column,numeric-method;
+date_add, date_add,
+date_add,Column,numeric-method;
+date_format, date_format,
+date_format,Column,character-method;
+date_sub, date_sub,
+date_sub,Column,numeric-method;
+datediff, datediff,
+datediff,Column-method;
+dayofmonth, dayofmonth,
+dayofmonth,Column-method;
+dayofyear, dayofyear,
+dayofyear,Column-method;
+from_unixtime, from_unixtime,
+from_unixtime,Column-method;
+from_utc_timestamp,
+from_utc_timestamp,
+from_utc_timestamp,Column,character-method;
+hour, hour,
+hour,Column-method; last_day,
+last_day,
+last_day,Column-method;
+minute, minute,
+minute,Column-method;
+months_between,
+months_between,
+months_between,Column-method;
+month, month,
+month,Column-method;
+next_day, next_day,
+next_day,Column,character-method;
+quarter, quarter,
+quarter,Column-method;
+to_date, to_date,
+to_date,Column-method;
+to_utc_timestamp,
+to_utc_timestamp,
+to_utc_timestamp,Column,character-method;
+unix_timestamp,
+unix_timestamp,
+unix_timestamp,
+unix_timestamp,
+unix_timestamp,Column,character-method,
+unix_timestamp,Column,missing-method,
+unix_timestamp,missing,missing-method;
+weekofyear, weekofyear,
+weekofyear,Column-method;
+window, window,
+window,Column-method; year,
+year, year,Column-method
+
+
+
+Examples
+
+## Not run: second(df$c)
+
+
+
+[Package SparkR version 2.0.2 Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/select.html
--
diff --git a/site/docs/2.0.2/api/R/select.html 
b/site/docs/2.0.2/api/R/select.html
new file mode 100644
index

spark git commit: [SPARK-18387][SQL] Add serialization to checkEvaluation.

2016-11-11 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 465e4b40b -> 87820da78


[SPARK-18387][SQL] Add serialization to checkEvaluation.

## What changes were proposed in this pull request?

This removes the serialization test from RegexpExpressionsSuite and
replaces it by serializing all expressions in checkEvaluation.

This also fixes math constant expressions by making LeafMathExpression
Serializable and fixes NumberFormat values that are null or invalid
after serialization.

## How was this patch tested?

This patch is to tests.

Author: Ryan Blue 

Closes #15847 from rdblue/SPARK-18387-fix-serializable-expressions.

(cherry picked from commit 6e95325fc3726d260054bd6e7c0717b3c139917e)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/87820da7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/87820da7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/87820da7

Branch: refs/heads/branch-2.1
Commit: 87820da782fd2d08078227a2ce5c363c3e1cb0f0
Parents: 465e4b4
Author: Ryan Blue 
Authored: Fri Nov 11 13:52:10 2016 -0800
Committer: Reynold Xin 
Committed: Fri Nov 11 13:52:18 2016 -0800

--
 .../catalyst/expressions/mathExpressions.scala  |  2 +-
 .../expressions/stringExpressions.scala | 44 +++-
 .../expressions/ExpressionEvalHelper.scala  | 15 ---
 .../expressions/RegexpExpressionsSuite.scala| 16 +--
 4 files changed, 36 insertions(+), 41 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/87820da7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
index a60494a..65273a7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
@@ -36,7 +36,7 @@ import org.apache.spark.unsafe.types.UTF8String
  * @param name The short name of the function
  */
 abstract class LeafMathExpression(c: Double, name: String)
-  extends LeafExpression with CodegenFallback {
+  extends LeafExpression with CodegenFallback with Serializable {
 
   override def dataType: DataType = DoubleType
   override def foldable: Boolean = true

http://git-wip-us.apache.org/repos/asf/spark/blob/87820da7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
index 5f533fe..e74ef9a 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
@@ -1431,18 +1431,20 @@ case class FormatNumber(x: Expression, d: Expression)
 
   // Associated with the pattern, for the last d value, and we will update the
   // pattern (DecimalFormat) once the new coming d value differ with the last 
one.
+  // This is an Option to distinguish between 0 (numberFormat is valid) and 
uninitialized after
+  // serialization (numberFormat has not been updated for dValue = 0).
   @transient
-  private var lastDValue: Int = -100
+  private var lastDValue: Option[Int] = None
 
   // A cached DecimalFormat, for performance concern, we will change it
   // only if the d value changed.
   @transient
-  private val pattern: StringBuffer = new StringBuffer()
+  private lazy val pattern: StringBuffer = new StringBuffer()
 
   // SPARK-13515: US Locale configures the DecimalFormat object to use a dot 
('.')
   // as a decimal separator.
   @transient
-  private val numberFormat = new DecimalFormat("", new 
DecimalFormatSymbols(Locale.US))
+  private lazy val numberFormat = new DecimalFormat("", new 
DecimalFormatSymbols(Locale.US))
 
   override protected def nullSafeEval(xObject: Any, dObject: Any): Any = {
 val dValue = dObject.asInstanceOf[Int]
@@ -1450,24 +1452,28 @@ case class FormatNumber(x: Expression, d: Expression)
   return null
 }
 
-if (dValue != lastDValue) {
-  // construct a new DecimalFormat only if a new dValue
-  pattern.delete(0, pattern.length)
-  pattern.append("#,###,###,###,###,###,##0")
-
-  // decimal place
-  if (dValue > 0) {
-

spark git commit: [SPARK-18387][SQL] Add serialization to checkEvaluation.

2016-11-11 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 6e7310590 -> 99575e88f


[SPARK-18387][SQL] Add serialization to checkEvaluation.

## What changes were proposed in this pull request?

This removes the serialization test from RegexpExpressionsSuite and
replaces it by serializing all expressions in checkEvaluation.

This also fixes math constant expressions by making LeafMathExpression
Serializable and fixes NumberFormat values that are null or invalid
after serialization.

## How was this patch tested?

This patch is to tests.

Author: Ryan Blue 

Closes #15847 from rdblue/SPARK-18387-fix-serializable-expressions.

(cherry picked from commit 6e95325fc3726d260054bd6e7c0717b3c139917e)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/99575e88
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/99575e88
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/99575e88

Branch: refs/heads/branch-2.0
Commit: 99575e88fd711c3fc25e8e6f00bbc8d1491feed6
Parents: 6e73105
Author: Ryan Blue 
Authored: Fri Nov 11 13:52:10 2016 -0800
Committer: Reynold Xin 
Committed: Fri Nov 11 13:52:28 2016 -0800

--
 .../catalyst/expressions/mathExpressions.scala  |  2 +-
 .../expressions/stringExpressions.scala | 44 +++-
 .../expressions/ExpressionEvalHelper.scala  | 15 ---
 .../expressions/RegexpExpressionsSuite.scala| 16 +--
 4 files changed, 36 insertions(+), 41 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/99575e88/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
index 5152265..591e1e5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
@@ -36,7 +36,7 @@ import org.apache.spark.unsafe.types.UTF8String
  * @param name The short name of the function
  */
 abstract class LeafMathExpression(c: Double, name: String)
-  extends LeafExpression with CodegenFallback {
+  extends LeafExpression with CodegenFallback with Serializable {
 
   override def dataType: DataType = DoubleType
   override def foldable: Boolean = true

http://git-wip-us.apache.org/repos/asf/spark/blob/99575e88/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
index 61549c9..004c74d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
@@ -1236,18 +1236,20 @@ case class FormatNumber(x: Expression, d: Expression)
 
   // Associated with the pattern, for the last d value, and we will update the
   // pattern (DecimalFormat) once the new coming d value differ with the last 
one.
+  // This is an Option to distinguish between 0 (numberFormat is valid) and 
uninitialized after
+  // serialization (numberFormat has not been updated for dValue = 0).
   @transient
-  private var lastDValue: Int = -100
+  private var lastDValue: Option[Int] = None
 
   // A cached DecimalFormat, for performance concern, we will change it
   // only if the d value changed.
   @transient
-  private val pattern: StringBuffer = new StringBuffer()
+  private lazy val pattern: StringBuffer = new StringBuffer()
 
   // SPARK-13515: US Locale configures the DecimalFormat object to use a dot 
('.')
   // as a decimal separator.
   @transient
-  private val numberFormat = new DecimalFormat("", new 
DecimalFormatSymbols(Locale.US))
+  private lazy val numberFormat = new DecimalFormat("", new 
DecimalFormatSymbols(Locale.US))
 
   override protected def nullSafeEval(xObject: Any, dObject: Any): Any = {
 val dValue = dObject.asInstanceOf[Int]
@@ -1255,24 +1257,28 @@ case class FormatNumber(x: Expression, d: Expression)
   return null
 }
 
-if (dValue != lastDValue) {
-  // construct a new DecimalFormat only if a new dValue
-  pattern.delete(0, pattern.length)
-  pattern.append("#,###,###,###,###,###,##0")
-
-  // decimal place
-  if (dValue > 0) {
-

spark git commit: [SPARK-18387][SQL] Add serialization to checkEvaluation.

2016-11-11 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master d42bb7cc4 -> 6e95325fc


[SPARK-18387][SQL] Add serialization to checkEvaluation.

## What changes were proposed in this pull request?

This removes the serialization test from RegexpExpressionsSuite and
replaces it by serializing all expressions in checkEvaluation.

This also fixes math constant expressions by making LeafMathExpression
Serializable and fixes NumberFormat values that are null or invalid
after serialization.

## How was this patch tested?

This patch is to tests.

Author: Ryan Blue 

Closes #15847 from rdblue/SPARK-18387-fix-serializable-expressions.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6e95325f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6e95325f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6e95325f

Branch: refs/heads/master
Commit: 6e95325fc3726d260054bd6e7c0717b3c139917e
Parents: d42bb7c
Author: Ryan Blue 
Authored: Fri Nov 11 13:52:10 2016 -0800
Committer: Reynold Xin 
Committed: Fri Nov 11 13:52:10 2016 -0800

--
 .../catalyst/expressions/mathExpressions.scala  |  2 +-
 .../expressions/stringExpressions.scala | 44 +++-
 .../expressions/ExpressionEvalHelper.scala  | 15 ---
 .../expressions/RegexpExpressionsSuite.scala| 16 +--
 4 files changed, 36 insertions(+), 41 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6e95325f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
index a60494a..65273a7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
@@ -36,7 +36,7 @@ import org.apache.spark.unsafe.types.UTF8String
  * @param name The short name of the function
  */
 abstract class LeafMathExpression(c: Double, name: String)
-  extends LeafExpression with CodegenFallback {
+  extends LeafExpression with CodegenFallback with Serializable {
 
   override def dataType: DataType = DoubleType
   override def foldable: Boolean = true

http://git-wip-us.apache.org/repos/asf/spark/blob/6e95325f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
index 5f533fe..e74ef9a 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
@@ -1431,18 +1431,20 @@ case class FormatNumber(x: Expression, d: Expression)
 
   // Associated with the pattern, for the last d value, and we will update the
   // pattern (DecimalFormat) once the new coming d value differ with the last 
one.
+  // This is an Option to distinguish between 0 (numberFormat is valid) and 
uninitialized after
+  // serialization (numberFormat has not been updated for dValue = 0).
   @transient
-  private var lastDValue: Int = -100
+  private var lastDValue: Option[Int] = None
 
   // A cached DecimalFormat, for performance concern, we will change it
   // only if the d value changed.
   @transient
-  private val pattern: StringBuffer = new StringBuffer()
+  private lazy val pattern: StringBuffer = new StringBuffer()
 
   // SPARK-13515: US Locale configures the DecimalFormat object to use a dot 
('.')
   // as a decimal separator.
   @transient
-  private val numberFormat = new DecimalFormat("", new 
DecimalFormatSymbols(Locale.US))
+  private lazy val numberFormat = new DecimalFormat("", new 
DecimalFormatSymbols(Locale.US))
 
   override protected def nullSafeEval(xObject: Any, dObject: Any): Any = {
 val dValue = dObject.asInstanceOf[Int]
@@ -1450,24 +1452,28 @@ case class FormatNumber(x: Expression, d: Expression)
   return null
 }
 
-if (dValue != lastDValue) {
-  // construct a new DecimalFormat only if a new dValue
-  pattern.delete(0, pattern.length)
-  pattern.append("#,###,###,###,###,###,##0")
-
-  // decimal place
-  if (dValue > 0) {
-pattern.append(".")
-
-var i = 0
-while (i < dValue) {
-  i += 1
-  pattern.append("0")
+

[spark] Git Push Summary

2016-11-11 Thread rxin

Repository: spark
Updated Tags:  refs/tags/v2.0.2-rc2 [deleted] a6abe1ee2

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] Git Push Summary

2016-11-11 Thread rxin

Repository: spark
Updated Tags:  refs/tags/v2.0.2-rc3 [deleted] 584354eaa

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] Git Push Summary

2016-11-11 Thread rxin

Repository: spark
Updated Tags:  refs/tags/v2.0.2 [created] 584354eaa

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE for Datasource tables

2016-11-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 c602894f2 -> 064d4315f


[SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE for Datasource tables

## What changes were proposed in this pull request?

As of current 2.1, INSERT OVERWRITE with dynamic partitions against a 
Datasource table will overwrite the entire table instead of only the partitions 
matching the static keys, as in Hive. It also doesn't respect custom partition 
locations.

This PR adds support for all these operations to Datasource tables managed by 
the Hive metastore. It is implemented as follows
- During planning time, the full set of partitions affected by an INSERT or 
OVERWRITE command is read from the Hive metastore.
- The planner identifies any partitions with custom locations and includes this 
in the write task metadata.
- FileFormatWriter tasks refer to this custom locations map when determining 
where to write for dynamic partition output.
- When the write job finishes, the set of written partitions is compared 
against the initial set of matched partitions, and the Hive metastore is 
updated to reflect the newly added / removed partitions.

It was necessary to introduce a method for staging files with absolute output 
paths to `FileCommitProtocol`. These files are not handled by the Hadoop output 
committer but are moved to their final locations when the job commits.

The overwrite behavior of legacy Datasource tables is also changed: no longer 
will the entire table be overwritten if a partial partition spec is present.

cc cloud-fan yhuai

## How was this patch tested?

Unit tests, existing tests.

Author: Eric Liang 
Author: Wenchen Fan 

Closes #15814 from ericl/sc-5027.

(cherry picked from commit a3356343cbf58b930326f45721fb4ecade6f8029)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/064d4315
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/064d4315
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/064d4315

Branch: refs/heads/branch-2.1
Commit: 064d4315f246450043a52882fcf59e95d79701e8
Parents: c602894
Author: Eric Liang 
Authored: Thu Nov 10 17:00:43 2016 -0800
Committer: Reynold Xin 
Committed: Thu Nov 10 17:01:08 2016 -0800

--
 .../spark/internal/io/FileCommitProtocol.scala  |  15 ++
 .../io/HadoopMapReduceCommitProtocol.scala  |  63 +++-
 .../spark/sql/catalyst/parser/AstBuilder.scala  |  12 +-
 .../plans/logical/basicLogicalOperators.scala   |  10 +-
 .../sql/catalyst/parser/PlanParserSuite.scala   |   4 +-
 .../sql/execution/datasources/DataSource.scala  |  20 +--
 .../datasources/DataSourceStrategy.scala|  94 +++
 .../datasources/FileFormatWriter.scala  |  26 ++-
 .../InsertIntoHadoopFsRelationCommand.scala |  61 ++-
 .../datasources/PartitioningUtils.scala |  10 ++
 .../execution/streaming/FileStreamSink.scala|   2 +-
 .../streaming/ManifestFileCommitProtocol.scala  |   6 +
 .../PartitionProviderCompatibilitySuite.scala   | 161 ++-
 13 files changed, 411 insertions(+), 73 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/064d4315/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala 
b/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala
index fb80205..afd2250 100644
--- a/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala
+++ b/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala
@@ -82,10 +82,25 @@ abstract class FileCommitProtocol {
*
* The "dir" parameter specifies 2, and "ext" parameter specifies both 4 and 
5, and the rest
* are left to the commit protocol implementation to decide.
+   *
+   * Important: it is the caller's responsibility to add uniquely identifying 
content to "ext"
+   * if a task is going to write out multiple files to the same dir. The file 
commit protocol only
+   * guarantees that files written by different tasks will not conflict.
*/
   def newTaskTempFile(taskContext: TaskAttemptContext, dir: Option[String], 
ext: String): String
 
   /**
+   * Similar to newTaskTempFile(), but allows files to committed to an 
absolute output location.
+   * Depending on the implementation, there may be weaker guarantees around 
adding files this way.
+   *
+   * Important: it is the caller's responsibility to add uniquely identifying 
content to "ext"
+   * if a task is going to write out multiple files to the same dir. The file 
commit protocol only
+   * guarantees that files written by

spark git commit: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE for Datasource tables

2016-11-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master e0deee1f7 -> a3356343c


[SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE for Datasource tables

## What changes were proposed in this pull request?

As of current 2.1, INSERT OVERWRITE with dynamic partitions against a 
Datasource table will overwrite the entire table instead of only the partitions 
matching the static keys, as in Hive. It also doesn't respect custom partition 
locations.

This PR adds support for all these operations to Datasource tables managed by 
the Hive metastore. It is implemented as follows
- During planning time, the full set of partitions affected by an INSERT or 
OVERWRITE command is read from the Hive metastore.
- The planner identifies any partitions with custom locations and includes this 
in the write task metadata.
- FileFormatWriter tasks refer to this custom locations map when determining 
where to write for dynamic partition output.
- When the write job finishes, the set of written partitions is compared 
against the initial set of matched partitions, and the Hive metastore is 
updated to reflect the newly added / removed partitions.

It was necessary to introduce a method for staging files with absolute output 
paths to `FileCommitProtocol`. These files are not handled by the Hadoop output 
committer but are moved to their final locations when the job commits.

The overwrite behavior of legacy Datasource tables is also changed: no longer 
will the entire table be overwritten if a partial partition spec is present.

cc cloud-fan yhuai

## How was this patch tested?

Unit tests, existing tests.

Author: Eric Liang 
Author: Wenchen Fan 

Closes #15814 from ericl/sc-5027.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a3356343
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a3356343
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a3356343

Branch: refs/heads/master
Commit: a3356343cbf58b930326f45721fb4ecade6f8029
Parents: e0deee1
Author: Eric Liang 
Authored: Thu Nov 10 17:00:43 2016 -0800
Committer: Reynold Xin 
Committed: Thu Nov 10 17:00:43 2016 -0800

--
 .../spark/internal/io/FileCommitProtocol.scala  |  15 ++
 .../io/HadoopMapReduceCommitProtocol.scala  |  63 +++-
 .../spark/sql/catalyst/parser/AstBuilder.scala  |  12 +-
 .../plans/logical/basicLogicalOperators.scala   |  10 +-
 .../sql/catalyst/parser/PlanParserSuite.scala   |   4 +-
 .../sql/execution/datasources/DataSource.scala  |  20 +--
 .../datasources/DataSourceStrategy.scala|  94 +++
 .../datasources/FileFormatWriter.scala  |  26 ++-
 .../InsertIntoHadoopFsRelationCommand.scala |  61 ++-
 .../datasources/PartitioningUtils.scala |  10 ++
 .../execution/streaming/FileStreamSink.scala|   2 +-
 .../streaming/ManifestFileCommitProtocol.scala  |   6 +
 .../PartitionProviderCompatibilitySuite.scala   | 161 ++-
 13 files changed, 411 insertions(+), 73 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a3356343/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala 
b/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala
index fb80205..afd2250 100644
--- a/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala
+++ b/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala
@@ -82,10 +82,25 @@ abstract class FileCommitProtocol {
*
* The "dir" parameter specifies 2, and "ext" parameter specifies both 4 and 
5, and the rest
* are left to the commit protocol implementation to decide.
+   *
+   * Important: it is the caller's responsibility to add uniquely identifying 
content to "ext"
+   * if a task is going to write out multiple files to the same dir. The file 
commit protocol only
+   * guarantees that files written by different tasks will not conflict.
*/
   def newTaskTempFile(taskContext: TaskAttemptContext, dir: Option[String], 
ext: String): String
 
   /**
+   * Similar to newTaskTempFile(), but allows files to committed to an 
absolute output location.
+   * Depending on the implementation, there may be weaker guarantees around 
adding files this way.
+   *
+   * Important: it is the caller's responsibility to add uniquely identifying 
content to "ext"
+   * if a task is going to write out multiple files to the same dir. The file 
commit protocol only
+   * guarantees that files written by different tasks will not conflict.
+   */
+  def newTaskTempFileAbsPath(
+  taskContext: TaskAttemptContext, absoluteDir:

spark git commit: [SPARK-18403][SQL] Temporarily disable flaky ObjectHashAggregateSuite

2016-11-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 2f7461f31 -> e0deee1f7


[SPARK-18403][SQL] Temporarily disable flaky ObjectHashAggregateSuite

## What changes were proposed in this pull request?

Randomized tests in `ObjectHashAggregateSuite` is being flaky and breaks PR 
builds. This PR disables them temporarily to bring back the PR build.

## How was this patch tested?

N/A

Author: Cheng Lian 

Closes #15845 from liancheng/ignore-flaky-object-hash-agg-suite.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e0deee1f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e0deee1f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e0deee1f

Branch: refs/heads/master
Commit: e0deee1f7df31177cfc14bbb296f0baa372f473d
Parents: 2f7461f
Author: Cheng Lian 
Authored: Thu Nov 10 13:44:54 2016 -0800
Committer: Reynold Xin 
Committed: Thu Nov 10 13:44:54 2016 -0800

--
 .../spark/sql/hive/execution/ObjectHashAggregateSuite.scala   | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e0deee1f/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala
--
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala
index 93fc5e8..b7f91d8 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala
@@ -326,7 +326,8 @@ class ObjectHashAggregateSuite
 // Currently Spark SQL doesn't support evaluating distinct 
aggregate function together
 // with aggregate functions without partial aggregation support.
 if (!(aggs.contains(withoutPartial) && 
aggs.contains(withDistinct))) {
-  test(
+  // TODO Re-enables them after fixing SPARK-18403
+  ignore(
 s"randomized aggregation test - " +
   s"${names.mkString("[", ", ", "]")} - " +
   s"${if (withGroupingKeys) "with" else "without"} grouping 
keys - " +


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17990][SPARK-18302][SQL] correct several partition related behaviours of ExternalCatalog

2016-11-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 be3933ddf -> c602894f2


[SPARK-17990][SPARK-18302][SQL] correct several partition related behaviours of 
ExternalCatalog

## What changes were proposed in this pull request?

This PR corrects several partition related behaviors of `ExternalCatalog`:

1. default partition location should not always lower case the partition column 
names in path string(fix `HiveExternalCatalog`)
2. rename partition should not always lower case the partition column names in 
updated partition path string(fix `HiveExternalCatalog`)
3. rename partition should update the partition location only for managed 
table(fix `InMemoryCatalog`)
4. create partition with existing directory should be fine(fix 
`InMemoryCatalog`)
5. create partition with non-existing directory should create that 
directory(fix `InMemoryCatalog`)
6. drop partition from external table should not delete the directory(fix 
`InMemoryCatalog`)

## How was this patch tested?

new tests in `ExternalCatalogSuite`

Author: Wenchen Fan 

Closes #15797 from cloud-fan/partition.

(cherry picked from commit 2f7461f31331cfc37f6cfa3586b7bbefb3af5547)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c602894f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c602894f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c602894f

Branch: refs/heads/branch-2.1
Commit: c602894f25bf9e61b759815674008471858cc71e
Parents: be3933d
Author: Wenchen Fan 
Authored: Thu Nov 10 13:42:48 2016 -0800
Committer: Reynold Xin 
Committed: Thu Nov 10 13:42:54 2016 -0800

--
 .../catalyst/catalog/ExternalCatalogUtils.scala | 121 +++
 .../sql/catalyst/catalog/InMemoryCatalog.scala  |  92 ++--
 .../spark/sql/catalyst/catalog/interface.scala  |  11 ++
 .../catalyst/catalog/ExternalCatalogSuite.scala | 150 +++
 .../catalyst/catalog/SessionCatalogSuite.scala  |  24 ++-
 .../spark/sql/execution/command/ddl.scala   |   8 +-
 .../spark/sql/execution/command/tables.scala|   3 +-
 .../datasources/CatalogFileIndex.scala  |   2 +-
 .../datasources/DataSourceStrategy.scala|   2 +-
 .../datasources/FileFormatWriter.scala  |   6 +-
 .../PartitioningAwareFileIndex.scala|   2 -
 .../datasources/PartitioningUtils.scala |  94 +---
 .../spark/sql/execution/command/DDLSuite.scala  |   8 +-
 .../ParquetPartitionDiscoverySuite.scala|  21 +--
 .../spark/sql/hive/HiveExternalCatalog.scala|  51 ++-
 .../spark/sql/hive/HiveSparkSubmitSuite.scala   |   4 +-
 .../spark/sql/hive/MultiDatabaseSuite.scala |   2 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala |   2 +-
 .../sql/hive/execution/SQLQuerySuite.scala  |   2 +-
 19 files changed, 397 insertions(+), 208 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c602894f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
new file mode 100644
index 000..b1442ee
--- /dev/null
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.catalog
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.util.Shell
+
+import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
+
+object ExternalCatalogUtils {
+  // This duplicates default value of Hive `ConfVars.DEFAULTPARTITIONNAME`, 
since catalyst doesn't
+  // depend on Hive.
+  val DEFAULT_PARTITION_NAME = "__HIVE_DEFAULT_PARTITION__"
+
+

spark git commit: [SPARK-17990][SPARK-18302][SQL] correct several partition related behaviours of ExternalCatalog

2016-11-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master b533fa2b2 -> 2f7461f31


[SPARK-17990][SPARK-18302][SQL] correct several partition related behaviours of 
ExternalCatalog

## What changes were proposed in this pull request?

This PR corrects several partition related behaviors of `ExternalCatalog`:

1. default partition location should not always lower case the partition column 
names in path string(fix `HiveExternalCatalog`)
2. rename partition should not always lower case the partition column names in 
updated partition path string(fix `HiveExternalCatalog`)
3. rename partition should update the partition location only for managed 
table(fix `InMemoryCatalog`)
4. create partition with existing directory should be fine(fix 
`InMemoryCatalog`)
5. create partition with non-existing directory should create that 
directory(fix `InMemoryCatalog`)
6. drop partition from external table should not delete the directory(fix 
`InMemoryCatalog`)

## How was this patch tested?

new tests in `ExternalCatalogSuite`

Author: Wenchen Fan 

Closes #15797 from cloud-fan/partition.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2f7461f3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2f7461f3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2f7461f3

Branch: refs/heads/master
Commit: 2f7461f31331cfc37f6cfa3586b7bbefb3af5547
Parents: b533fa2
Author: Wenchen Fan 
Authored: Thu Nov 10 13:42:48 2016 -0800
Committer: Reynold Xin 
Committed: Thu Nov 10 13:42:48 2016 -0800

--
 .../catalyst/catalog/ExternalCatalogUtils.scala | 121 +++
 .../sql/catalyst/catalog/InMemoryCatalog.scala  |  92 ++--
 .../spark/sql/catalyst/catalog/interface.scala  |  11 ++
 .../catalyst/catalog/ExternalCatalogSuite.scala | 150 +++
 .../catalyst/catalog/SessionCatalogSuite.scala  |  24 ++-
 .../spark/sql/execution/command/ddl.scala   |   8 +-
 .../spark/sql/execution/command/tables.scala|   3 +-
 .../datasources/CatalogFileIndex.scala  |   2 +-
 .../datasources/DataSourceStrategy.scala|   2 +-
 .../datasources/FileFormatWriter.scala  |   6 +-
 .../PartitioningAwareFileIndex.scala|   2 -
 .../datasources/PartitioningUtils.scala |  94 +---
 .../spark/sql/execution/command/DDLSuite.scala  |   8 +-
 .../ParquetPartitionDiscoverySuite.scala|  21 +--
 .../spark/sql/hive/HiveExternalCatalog.scala|  51 ++-
 .../spark/sql/hive/HiveSparkSubmitSuite.scala   |   4 +-
 .../spark/sql/hive/MultiDatabaseSuite.scala |   2 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala |   2 +-
 .../sql/hive/execution/SQLQuerySuite.scala  |   2 +-
 19 files changed, 397 insertions(+), 208 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2f7461f3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
new file mode 100644
index 000..b1442ee
--- /dev/null
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.catalog
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.util.Shell
+
+import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
+
+object ExternalCatalogUtils {
+  // This duplicates default value of Hive `ConfVars.DEFAULTPARTITIONNAME`, 
since catalyst doesn't
+  // depend on Hive.
+  val DEFAULT_PARTITION_NAME = "__HIVE_DEFAULT_PARTITION__"
+
+  
//
+  // The following string escaping code is mainly copied

spark git commit: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-11-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 62236b9eb -> be3933ddf


[SPARK-17993][SQL] Fix Parquet log output redirection

(Link to Jira issue: https://issues.apache.org/jira/browse/SPARK-17993)
## What changes were proposed in this pull request?

PR #14690 broke parquet log output redirection for converted partitioned Hive 
tables. For example, when querying parquet files written by Parquet-mr 1.6.0 
Spark prints a torrent of (harmless) warning messages from the Parquet reader:

```
Oct 18, 2016 7:42:18 PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring 
statistics because created_by could not be parsed (see PARQUET-251): parquet-mr 
version 1.6.0
org.apache.parquet.VersionParser$VersionParseException: Could not parse 
created_by: parquet-mr version 1.6.0 using format: (.+) version ((.*) )?\(build 
?(.*)\)
at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
at 
org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60)
at 
org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:263)
at 
org.apache.parquet.hadoop.ParquetFileReader$Chunk.readAllPages(ParquetFileReader.java:583)
at 
org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:513)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:270)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:225)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137)
at 
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:102)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:162)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:102)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:372)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
```

This only happens during execution, not planning, and it doesn't matter what 
log level the `SparkContext` is set to. That's because Parquet (versions < 1.9) 
doesn't use slf4j for logging. Note, you can tell that log redirection is not 
working here because the log message format does not conform to the default 
Spark log message format.

This is a regression I noted as something we needed to fix as a follow up.

It appears that the problem arose because we removed the call to `inferSchema` 
during Hive table conversion. That call is what triggered the output 
redirection.

## How was this patch tested?

I tested this manually in four ways:
1. Executing `spark.sqlContext.range(10).selectExpr("id as 
a").write.mode("overwrite").parquet("test")`.
2. Executing `spark.read.format("parquet").load(legacyParquetFile).show` for a 
Parquet file `legacyParquetFile` written using Parquet-mr 1.6.0.
3. Executing `select * from legacy_parquet_table limit 1` for some 
unpartitioned Parquet-based Hive table written using Parquet-mr 1.6.0.
4. Executing `select * from legacy_partitioned_parquet_table where partcol=x 
limit 1` for some partitioned Parquet-based Hive table written using Parquet-mr 
1.6.0.

I ran each test with a new instance of `spark-shell` or `spark-sql`.

Incidentally, I found that test case 3 was not a

spark git commit: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-11-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 16eaad9da -> b533fa2b2


[SPARK-17993][SQL] Fix Parquet log output redirection

(Link to Jira issue: https://issues.apache.org/jira/browse/SPARK-17993)
## What changes were proposed in this pull request?

PR #14690 broke parquet log output redirection for converted partitioned Hive 
tables. For example, when querying parquet files written by Parquet-mr 1.6.0 
Spark prints a torrent of (harmless) warning messages from the Parquet reader:

```
Oct 18, 2016 7:42:18 PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring 
statistics because created_by could not be parsed (see PARQUET-251): parquet-mr 
version 1.6.0
org.apache.parquet.VersionParser$VersionParseException: Could not parse 
created_by: parquet-mr version 1.6.0 using format: (.+) version ((.*) )?\(build 
?(.*)\)
at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
at 
org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60)
at 
org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:263)
at 
org.apache.parquet.hadoop.ParquetFileReader$Chunk.readAllPages(ParquetFileReader.java:583)
at 
org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:513)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:270)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:225)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137)
at 
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:102)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:162)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:102)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:372)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
```

This only happens during execution, not planning, and it doesn't matter what 
log level the `SparkContext` is set to. That's because Parquet (versions < 1.9) 
doesn't use slf4j for logging. Note, you can tell that log redirection is not 
working here because the log message format does not conform to the default 
Spark log message format.

This is a regression I noted as something we needed to fix as a follow up.

It appears that the problem arose because we removed the call to `inferSchema` 
during Hive table conversion. That call is what triggered the output 
redirection.

## How was this patch tested?

I tested this manually in four ways:
1. Executing `spark.sqlContext.range(10).selectExpr("id as 
a").write.mode("overwrite").parquet("test")`.
2. Executing `spark.read.format("parquet").load(legacyParquetFile).show` for a 
Parquet file `legacyParquetFile` written using Parquet-mr 1.6.0.
3. Executing `select * from legacy_parquet_table limit 1` for some 
unpartitioned Parquet-based Hive table written using Parquet-mr 1.6.0.
4. Executing `select * from legacy_partitioned_parquet_table where partcol=x 
limit 1` for some partitioned Parquet-based Hive table written using Parquet-mr 
1.6.0.

I ran each test with a new instance of `spark-shell` or `spark-sql`.

Incidentally, I found that test case 3 was not a

spark git commit: [SPARK-18262][BUILD][SQL] JSON.org license is now CatX

2016-11-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 b54d71b6f -> 62236b9eb


[SPARK-18262][BUILD][SQL] JSON.org license is now CatX

## What changes were proposed in this pull request?

Try excluding org.json:json from hive-exec dep as it's Cat X now. It may be the 
case that it's not used by the part of Hive Spark uses anyway.

## How was this patch tested?

Existing tests

Author: Sean Owen 

Closes #15798 from srowen/SPARK-18262.

(cherry picked from commit 16eaad9daed0b633e6a714b5704509aa7107d6e5)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/62236b9e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/62236b9e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/62236b9e

Branch: refs/heads/branch-2.1
Commit: 62236b9eb951f171d96e9d7f5f12d641a2da9a26
Parents: b54d71b
Author: Sean Owen 
Authored: Thu Nov 10 10:20:03 2016 -0800
Committer: Reynold Xin 
Committed: Thu Nov 10 10:20:11 2016 -0800

--
 NOTICE | 3 ---
 dev/deps/spark-deps-hadoop-2.2 | 1 -
 dev/deps/spark-deps-hadoop-2.3 | 1 -
 dev/deps/spark-deps-hadoop-2.4 | 1 -
 dev/deps/spark-deps-hadoop-2.6 | 1 -
 dev/deps/spark-deps-hadoop-2.7 | 1 -
 pom.xml| 5 +
 7 files changed, 5 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/62236b9e/NOTICE
--
diff --git a/NOTICE b/NOTICE
index 69b513e..f4b64b5 100644
--- a/NOTICE
+++ b/NOTICE
@@ -421,9 +421,6 @@ Copyright (c) 2011, Terrence Parr.
 This product includes/uses ASM (http://asm.ow2.org/),
 Copyright (c) 2000-2007 INRIA, France Telecom.
 
-This product includes/uses org.json (http://www.json.org/java/index.html),
-Copyright (c) 2002 JSON.org
-
 This product includes/uses JLine (http://jline.sourceforge.net/),
 Copyright (c) 2002-2006, Marc Prud'hommeaux .
 

http://git-wip-us.apache.org/repos/asf/spark/blob/62236b9e/dev/deps/spark-deps-hadoop-2.2
--
diff --git a/dev/deps/spark-deps-hadoop-2.2 b/dev/deps/spark-deps-hadoop-2.2
index 99279a4..6e749ac 100644
--- a/dev/deps/spark-deps-hadoop-2.2
+++ b/dev/deps/spark-deps-hadoop-2.2
@@ -103,7 +103,6 @@ jline-2.12.1.jar
 joda-time-2.9.3.jar
 jodd-core-3.5.2.jar
 jpam-1.1.jar
-json-20090211.jar
 json4s-ast_2.11-3.2.11.jar
 json4s-core_2.11-3.2.11.jar
 json4s-jackson_2.11-3.2.11.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/62236b9e/dev/deps/spark-deps-hadoop-2.3
--
diff --git a/dev/deps/spark-deps-hadoop-2.3 b/dev/deps/spark-deps-hadoop-2.3
index f094b4a..515995a 100644
--- a/dev/deps/spark-deps-hadoop-2.3
+++ b/dev/deps/spark-deps-hadoop-2.3
@@ -108,7 +108,6 @@ jline-2.12.1.jar
 joda-time-2.9.3.jar
 jodd-core-3.5.2.jar
 jpam-1.1.jar
-json-20090211.jar
 json4s-ast_2.11-3.2.11.jar
 json4s-core_2.11-3.2.11.jar
 json4s-jackson_2.11-3.2.11.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/62236b9e/dev/deps/spark-deps-hadoop-2.4
--
diff --git a/dev/deps/spark-deps-hadoop-2.4 b/dev/deps/spark-deps-hadoop-2.4
index 7f0ef98..d2139fd 100644
--- a/dev/deps/spark-deps-hadoop-2.4
+++ b/dev/deps/spark-deps-hadoop-2.4
@@ -108,7 +108,6 @@ jline-2.12.1.jar
 joda-time-2.9.3.jar
 jodd-core-3.5.2.jar
 jpam-1.1.jar
-json-20090211.jar
 json4s-ast_2.11-3.2.11.jar
 json4s-core_2.11-3.2.11.jar
 json4s-jackson_2.11-3.2.11.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/62236b9e/dev/deps/spark-deps-hadoop-2.6
--
diff --git a/dev/deps/spark-deps-hadoop-2.6 b/dev/deps/spark-deps-hadoop-2.6
index 4a27bf3..b5cecf7 100644
--- a/dev/deps/spark-deps-hadoop-2.6
+++ b/dev/deps/spark-deps-hadoop-2.6
@@ -116,7 +116,6 @@ jline-2.12.1.jar
 joda-time-2.9.3.jar
 jodd-core-3.5.2.jar
 jpam-1.1.jar
-json-20090211.jar
 json4s-ast_2.11-3.2.11.jar
 json4s-core_2.11-3.2.11.jar
 json4s-jackson_2.11-3.2.11.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/62236b9e/dev/deps/spark-deps-hadoop-2.7
--
diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7
index 151670a..a5e03a7 100644
--- a/dev/deps/spark-deps-hadoop-2.7
+++ b/dev/deps/spark-deps-hadoop-2.7
@@ -116,7 +116,6 @@ jline-2.12.1.jar
 joda-time-2.9.3.jar
 jodd-core-3.5.2.jar
 jpam-1.1.jar
-json-20090211.jar
 json4s-ast_2.11-3.2.11.jar
 json4s-core_2.11-3.2.11.jar
 json4s-jackson_2.11-3.2.11.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/62236b9e/pom.xml

spark git commit: [SPARK-18262][BUILD][SQL] JSON.org license is now CatX

2016-11-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 22a9d064e -> 16eaad9da


[SPARK-18262][BUILD][SQL] JSON.org license is now CatX

## What changes were proposed in this pull request?

Try excluding org.json:json from hive-exec dep as it's Cat X now. It may be the 
case that it's not used by the part of Hive Spark uses anyway.

## How was this patch tested?

Existing tests

Author: Sean Owen 

Closes #15798 from srowen/SPARK-18262.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/16eaad9d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/16eaad9d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/16eaad9d

Branch: refs/heads/master
Commit: 16eaad9daed0b633e6a714b5704509aa7107d6e5
Parents: 22a9d06
Author: Sean Owen 
Authored: Thu Nov 10 10:20:03 2016 -0800
Committer: Reynold Xin 
Committed: Thu Nov 10 10:20:03 2016 -0800

--
 NOTICE | 3 ---
 dev/deps/spark-deps-hadoop-2.2 | 1 -
 dev/deps/spark-deps-hadoop-2.3 | 1 -
 dev/deps/spark-deps-hadoop-2.4 | 1 -
 dev/deps/spark-deps-hadoop-2.6 | 1 -
 dev/deps/spark-deps-hadoop-2.7 | 1 -
 pom.xml| 5 +
 7 files changed, 5 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/16eaad9d/NOTICE
--
diff --git a/NOTICE b/NOTICE
index 69b513e..f4b64b5 100644
--- a/NOTICE
+++ b/NOTICE
@@ -421,9 +421,6 @@ Copyright (c) 2011, Terrence Parr.
 This product includes/uses ASM (http://asm.ow2.org/),
 Copyright (c) 2000-2007 INRIA, France Telecom.
 
-This product includes/uses org.json (http://www.json.org/java/index.html),
-Copyright (c) 2002 JSON.org
-
 This product includes/uses JLine (http://jline.sourceforge.net/),
 Copyright (c) 2002-2006, Marc Prud'hommeaux .
 

http://git-wip-us.apache.org/repos/asf/spark/blob/16eaad9d/dev/deps/spark-deps-hadoop-2.2
--
diff --git a/dev/deps/spark-deps-hadoop-2.2 b/dev/deps/spark-deps-hadoop-2.2
index 99279a4..6e749ac 100644
--- a/dev/deps/spark-deps-hadoop-2.2
+++ b/dev/deps/spark-deps-hadoop-2.2
@@ -103,7 +103,6 @@ jline-2.12.1.jar
 joda-time-2.9.3.jar
 jodd-core-3.5.2.jar
 jpam-1.1.jar
-json-20090211.jar
 json4s-ast_2.11-3.2.11.jar
 json4s-core_2.11-3.2.11.jar
 json4s-jackson_2.11-3.2.11.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/16eaad9d/dev/deps/spark-deps-hadoop-2.3
--
diff --git a/dev/deps/spark-deps-hadoop-2.3 b/dev/deps/spark-deps-hadoop-2.3
index f094b4a..515995a 100644
--- a/dev/deps/spark-deps-hadoop-2.3
+++ b/dev/deps/spark-deps-hadoop-2.3
@@ -108,7 +108,6 @@ jline-2.12.1.jar
 joda-time-2.9.3.jar
 jodd-core-3.5.2.jar
 jpam-1.1.jar
-json-20090211.jar
 json4s-ast_2.11-3.2.11.jar
 json4s-core_2.11-3.2.11.jar
 json4s-jackson_2.11-3.2.11.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/16eaad9d/dev/deps/spark-deps-hadoop-2.4
--
diff --git a/dev/deps/spark-deps-hadoop-2.4 b/dev/deps/spark-deps-hadoop-2.4
index 7f0ef98..d2139fd 100644
--- a/dev/deps/spark-deps-hadoop-2.4
+++ b/dev/deps/spark-deps-hadoop-2.4
@@ -108,7 +108,6 @@ jline-2.12.1.jar
 joda-time-2.9.3.jar
 jodd-core-3.5.2.jar
 jpam-1.1.jar
-json-20090211.jar
 json4s-ast_2.11-3.2.11.jar
 json4s-core_2.11-3.2.11.jar
 json4s-jackson_2.11-3.2.11.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/16eaad9d/dev/deps/spark-deps-hadoop-2.6
--
diff --git a/dev/deps/spark-deps-hadoop-2.6 b/dev/deps/spark-deps-hadoop-2.6
index 4a27bf3..b5cecf7 100644
--- a/dev/deps/spark-deps-hadoop-2.6
+++ b/dev/deps/spark-deps-hadoop-2.6
@@ -116,7 +116,6 @@ jline-2.12.1.jar
 joda-time-2.9.3.jar
 jodd-core-3.5.2.jar
 jpam-1.1.jar
-json-20090211.jar
 json4s-ast_2.11-3.2.11.jar
 json4s-core_2.11-3.2.11.jar
 json4s-jackson_2.11-3.2.11.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/16eaad9d/dev/deps/spark-deps-hadoop-2.7
--
diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7
index 151670a..a5e03a7 100644
--- a/dev/deps/spark-deps-hadoop-2.7
+++ b/dev/deps/spark-deps-hadoop-2.7
@@ -116,7 +116,6 @@ jline-2.12.1.jar
 joda-time-2.9.3.jar
 jodd-core-3.5.2.jar
 jpam-1.1.jar
-json-20090211.jar
 json4s-ast_2.11-3.2.11.jar
 json4s-core_2.11-3.2.11.jar
 json4s-jackson_2.11-3.2.11.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/16eaad9d/pom.xml
--
diff --git a/pom.xml b/pom.xml
index 04d2eaa..8aa0a6c 100644
---

spark git commit: [SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `OutputFormat` is `Configurable`.

2016-11-09 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master d8b81f778 -> 64fbdf1aa


[SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `OutputFormat` is 
`Configurable`.

## What changes were proposed in this pull request?

We should call `setConf` if `OutputFormat` is `Configurable`, this should be 
done before we create `OutputCommitter` and `RecordWriter`.
This is follow up of #15769, see discussion 
[here](https://github.com/apache/spark/pull/15769/files#r87064229)

## How was this patch tested?

Add test of this case in `PairRDDFunctionsSuite`.

Author: jiangxingbo 

Closes #15823 from jiangxb1987/config-format.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/64fbdf1a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/64fbdf1a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/64fbdf1a

Branch: refs/heads/master
Commit: 64fbdf1aa90b66269daec29f62dc9431c1173bab
Parents: d8b81f7
Author: jiangxingbo 
Authored: Wed Nov 9 13:14:26 2016 -0800
Committer: Reynold Xin 
Committed: Wed Nov 9 13:14:26 2016 -0800

--
 .../internal/io/HadoopMapReduceCommitProtocol.scala  |  9 -
 .../internal/io/SparkHadoopMapReduceWriter.scala |  9 +++--
 .../org/apache/spark/rdd/PairRDDFunctionsSuite.scala | 15 +++
 3 files changed, 30 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/64fbdf1a/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 
b/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
index d643a32..6b0bcb8 100644
--- 
a/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
+++ 
b/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
@@ -19,6 +19,7 @@ package org.apache.spark.internal.io
 
 import java.util.Date
 
+import org.apache.hadoop.conf.Configurable
 import org.apache.hadoop.fs.Path
 import org.apache.hadoop.mapreduce._
 import org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
@@ -42,7 +43,13 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: 
String)
   @transient private var committer: OutputCommitter = _
 
   protected def setupCommitter(context: TaskAttemptContext): OutputCommitter = 
{
-context.getOutputFormatClass.newInstance().getOutputCommitter(context)
+val format = context.getOutputFormatClass.newInstance()
+// If OutputFormat is Configurable, we should set conf to it.
+format match {
+  case c: Configurable => c.setConf(context.getConfiguration)
+  case _ => ()
+}
+format.getOutputCommitter(context)
   }
 
   override def newTaskTempFile(

http://git-wip-us.apache.org/repos/asf/spark/blob/64fbdf1a/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala
 
b/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala
index a405c44..7964392 100644
--- 
a/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala
+++ 
b/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala
@@ -23,7 +23,7 @@ import java.util.{Date, Locale}
 import scala.reflect.ClassTag
 import scala.util.DynamicVariable
 
-import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.conf.{Configurable, Configuration}
 import org.apache.hadoop.fs.Path
 import org.apache.hadoop.mapred.{JobConf, JobID}
 import org.apache.hadoop.mapreduce._
@@ -140,7 +140,12 @@ object SparkHadoopMapReduceWriter extends Logging {
   SparkHadoopWriterUtils.initHadoopOutputMetrics(context)
 
 // Initiate the writer.
-val taskFormat = outputFormat.newInstance
+val taskFormat = outputFormat.newInstance()
+// If OutputFormat is Configurable, we should set conf to it.
+taskFormat match {
+  case c: Configurable => c.setConf(hadoopConf)
+  case _ => ()
+}
 val writer = taskFormat.getRecordWriter(taskContext)
   .asInstanceOf[RecordWriter[K, V]]
 require(writer != null, "Unable to obtain RecordWriter")

http://git-wip-us.apache.org/repos/asf/spark/blob/64fbdf1a/core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala 
b/core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala
index

spark git commit: [SPARK-18370][SQL] Add table information to InsertIntoHadoopFsRelationCommand

2016-11-09 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 80f58510a -> 4424c901e


[SPARK-18370][SQL] Add table information to InsertIntoHadoopFsRelationCommand

## What changes were proposed in this pull request?
`InsertIntoHadoopFsRelationCommand` does not keep track if it inserts into a 
table and what table it inserts to. This can make debugging these statements 
problematic. This PR adds table information the 
`InsertIntoHadoopFsRelationCommand`. Explaining this SQL command `insert into 
prq select * from range(0, 10)` now yields the following executed plan:
```
== Physical Plan ==
ExecutedCommand
   +- InsertIntoHadoopFsRelationCommand file:/dev/assembly/spark-warehouse/prq, 
ParquetFormat, , Map(serialization.format -> 1, path -> 
file:/dev/assembly/spark-warehouse/prq), Append, CatalogTable(
Table: `default`.`prq`
Owner: hvanhovell
Created: Wed Nov 09 17:42:30 CET 2016
Last Access: Thu Jan 01 01:00:00 CET 1970
Type: MANAGED
Schema: [StructField(id,LongType,true)]
Provider: parquet
Properties: [transient_lastDdlTime=1478709750]
Storage(Location: file:/dev/assembly/spark-warehouse/prq, InputFormat: 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, OutputFormat: 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, Serde: 
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Properties: 
[serialization.format=1]))
 +- Project [id#7L]
+- Range (0, 10, step=1, splits=None)
```

## How was this patch tested?
Added extra checks to the `ParquetMetastoreSuite`

Author: Herman van Hovell 

Closes #15832 from hvanhovell/SPARK-18370.

(cherry picked from commit d8b81f778af8c3d7112ad37f691c49215b392836)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4424c901
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4424c901
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4424c901

Branch: refs/heads/branch-2.1
Commit: 4424c901e82ed4992d5568cbc5a5f524b88dc5eb
Parents: 80f5851
Author: Herman van Hovell 
Authored: Wed Nov 9 12:26:09 2016 -0800
Committer: Reynold Xin 
Committed: Wed Nov 9 12:26:17 2016 -0800

--
 .../apache/spark/sql/execution/datasources/DataSource.scala| 3 ++-
 .../spark/sql/execution/datasources/DataSourceStrategy.scala   | 5 +++--
 .../datasources/InsertIntoHadoopFsRelationCommand.scala| 5 +++--
 .../test/scala/org/apache/spark/sql/hive/parquetSuites.scala   | 6 --
 4 files changed, 12 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4424c901/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
index 5266611..5d66394 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
@@ -424,7 +424,8 @@ case class DataSource(
 _ => Unit, // No existing table needs to be refreshed.
 options,
 data.logicalPlan,
-mode)
+mode,
+catalogTable)
 sparkSession.sessionState.executePlan(plan).toRdd
 // Replace the schema with that of the DataFrame we just wrote out to 
avoid re-inferring it.
 copy(userSpecifiedSchema = 
Some(data.schema.asNullable)).resolveRelation()

http://git-wip-us.apache.org/repos/asf/spark/blob/4424c901/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
index a548e88..2d43a6a 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
@@ -162,7 +162,7 @@ case class DataSourceAnalysis(conf: CatalystConf) extends 
Rule[LogicalPlan] {
 
 
 case i @ logical.InsertIntoTable(
-   l @ LogicalRelation(t: HadoopFsRelation, _, _), part, query, 
overwrite, false)
+   l @ LogicalRelation(t: HadoopFsRelation, _, table), part, query, 
overwrite, false)
 if query.resolved && t.schema.asNullable ==

spark git commit: [SPARK-18370][SQL] Add table information to InsertIntoHadoopFsRelationCommand

2016-11-09 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master d4028de97 -> d8b81f778


[SPARK-18370][SQL] Add table information to InsertIntoHadoopFsRelationCommand

## What changes were proposed in this pull request?
`InsertIntoHadoopFsRelationCommand` does not keep track if it inserts into a 
table and what table it inserts to. This can make debugging these statements 
problematic. This PR adds table information the 
`InsertIntoHadoopFsRelationCommand`. Explaining this SQL command `insert into 
prq select * from range(0, 10)` now yields the following executed plan:
```
== Physical Plan ==
ExecutedCommand
   +- InsertIntoHadoopFsRelationCommand file:/dev/assembly/spark-warehouse/prq, 
ParquetFormat, , Map(serialization.format -> 1, path -> 
file:/dev/assembly/spark-warehouse/prq), Append, CatalogTable(
Table: `default`.`prq`
Owner: hvanhovell
Created: Wed Nov 09 17:42:30 CET 2016
Last Access: Thu Jan 01 01:00:00 CET 1970
Type: MANAGED
Schema: [StructField(id,LongType,true)]
Provider: parquet
Properties: [transient_lastDdlTime=1478709750]
Storage(Location: file:/dev/assembly/spark-warehouse/prq, InputFormat: 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, OutputFormat: 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, Serde: 
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Properties: 
[serialization.format=1]))
 +- Project [id#7L]
+- Range (0, 10, step=1, splits=None)
```

## How was this patch tested?
Added extra checks to the `ParquetMetastoreSuite`

Author: Herman van Hovell 

Closes #15832 from hvanhovell/SPARK-18370.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d8b81f77
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d8b81f77
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d8b81f77

Branch: refs/heads/master
Commit: d8b81f778af8c3d7112ad37f691c49215b392836
Parents: d4028de
Author: Herman van Hovell 
Authored: Wed Nov 9 12:26:09 2016 -0800
Committer: Reynold Xin 
Committed: Wed Nov 9 12:26:09 2016 -0800

--
 .../apache/spark/sql/execution/datasources/DataSource.scala| 3 ++-
 .../spark/sql/execution/datasources/DataSourceStrategy.scala   | 5 +++--
 .../datasources/InsertIntoHadoopFsRelationCommand.scala| 5 +++--
 .../test/scala/org/apache/spark/sql/hive/parquetSuites.scala   | 6 --
 4 files changed, 12 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d8b81f77/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
index 5266611..5d66394 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
@@ -424,7 +424,8 @@ case class DataSource(
 _ => Unit, // No existing table needs to be refreshed.
 options,
 data.logicalPlan,
-mode)
+mode,
+catalogTable)
 sparkSession.sessionState.executePlan(plan).toRdd
 // Replace the schema with that of the DataFrame we just wrote out to 
avoid re-inferring it.
 copy(userSpecifiedSchema = 
Some(data.schema.asNullable)).resolveRelation()

http://git-wip-us.apache.org/repos/asf/spark/blob/d8b81f77/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
index a548e88..2d43a6a 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
@@ -162,7 +162,7 @@ case class DataSourceAnalysis(conf: CatalystConf) extends 
Rule[LogicalPlan] {
 
 
 case i @ logical.InsertIntoTable(
-   l @ LogicalRelation(t: HadoopFsRelation, _, _), part, query, 
overwrite, false)
+   l @ LogicalRelation(t: HadoopFsRelation, _, table), part, query, 
overwrite, false)
 if query.resolved && t.schema.asNullable == query.schema.asNullable =>
 
   // Sanity checks
@@ -222,7 +222,8 @@ case class DataSourceAnalysis(conf: CatalystConf) extends

spark git commit: [SPARK-18368] Fix regexp_replace with task serialization.

2016-11-08 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 0cceb1bfe -> bdddc661b


[SPARK-18368] Fix regexp_replace with task serialization.

## What changes were proposed in this pull request?

This makes the result value both transient and lazy, so that if the 
RegExpReplace object is initialized then serialized, `result: StringBuffer` 
will be correctly initialized.

## How was this patch tested?

* Verified that this patch fixed the query that found the bug.
* Added a test case that fails without the fix.

Author: Ryan Blue 

Closes #15816 from rdblue/SPARK-18368-fix-regexp-replace.

(cherry picked from commit b9192bb3ffc319ebee7dbd15c24656795e454749)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bdddc661
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bdddc661
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bdddc661

Branch: refs/heads/branch-2.0
Commit: bdddc661b71725dce35c6b2edd9ccb22e774e997
Parents: 0cceb1b
Author: Ryan Blue 
Authored: Tue Nov 8 23:47:48 2016 -0800
Committer: Reynold Xin 
Committed: Tue Nov 8 23:48:06 2016 -0800

--
 .../sql/catalyst/expressions/regexpExpressions.scala |  2 +-
 .../catalyst/expressions/ExpressionEvalHelper.scala  | 15 +--
 2 files changed, 10 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bdddc661/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
index d25da3f..f6a55cf 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
@@ -220,7 +220,7 @@ case class RegExpReplace(subject: Expression, regexp: 
Expression, rep: Expressio
   @transient private var lastReplacement: String = _
   @transient private var lastReplacementInUTF8: UTF8String = _
   // result buffer write by Matcher
-  @transient private val result: StringBuffer = new StringBuffer
+  @transient private lazy val result: StringBuffer = new StringBuffer
 
   override def nullSafeEval(s: Any, p: Any, r: Any): Any = {
 if (!p.equals(lastRegex)) {

http://git-wip-us.apache.org/repos/asf/spark/blob/bdddc661/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
index 668543a..186079f 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
@@ -21,7 +21,8 @@ import org.scalacheck.Gen
 import org.scalactic.TripleEqualsSupport.Spread
 import org.scalatest.prop.GeneratorDrivenPropertyChecks
 
-import org.apache.spark.SparkFunSuite
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.serializer.JavaSerializer
 import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
 import org.apache.spark.sql.catalyst.expressions.codegen._
 import org.apache.spark.sql.catalyst.optimizer.SimpleTestOptimizer
@@ -42,13 +43,15 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 
   protected def checkEvaluation(
   expression: => Expression, expected: Any, inputRow: InternalRow = 
EmptyRow): Unit = {
+val serializer = new JavaSerializer(new SparkConf()).newInstance
+val expr: Expression = 
serializer.deserialize(serializer.serialize(expression))
 val catalystValue = CatalystTypeConverters.convertToCatalyst(expected)
-checkEvaluationWithoutCodegen(expression, catalystValue, inputRow)
-checkEvaluationWithGeneratedMutableProjection(expression, catalystValue, 
inputRow)
-if (GenerateUnsafeProjection.canSupport(expression.dataType)) {
-  checkEvalutionWithUnsafeProjection(expression, catalystValue, inputRow)
+checkEvaluationWithoutCodegen(expr, catalystValue, inputRow)
+checkEvaluationWithGeneratedMutableProjection(expr, catalystValue, 
inputRow)
+if (GenerateUnsafeProjection.canSupport(expr.dataType)) {
+  checkEvalutionWithUnsafeProjection(expr, catalystValue, inputRow)
 }
-

spark git commit: [SPARK-18368] Fix regexp_replace with task serialization.

2016-11-08 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 0dc14f129 -> f67208369


[SPARK-18368] Fix regexp_replace with task serialization.

## What changes were proposed in this pull request?

This makes the result value both transient and lazy, so that if the 
RegExpReplace object is initialized then serialized, `result: StringBuffer` 
will be correctly initialized.

## How was this patch tested?

* Verified that this patch fixed the query that found the bug.
* Added a test case that fails without the fix.

Author: Ryan Blue 

Closes #15816 from rdblue/SPARK-18368-fix-regexp-replace.

(cherry picked from commit b9192bb3ffc319ebee7dbd15c24656795e454749)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f6720836
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f6720836
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f6720836

Branch: refs/heads/branch-2.1
Commit: f672083693c2c4dfea6dc43c024993d4561b1e79
Parents: 0dc14f1
Author: Ryan Blue 
Authored: Tue Nov 8 23:47:48 2016 -0800
Committer: Reynold Xin 
Committed: Tue Nov 8 23:47:56 2016 -0800

--
 .../sql/catalyst/expressions/regexpExpressions.scala |  2 +-
 .../catalyst/expressions/ExpressionEvalHelper.scala  | 15 +--
 2 files changed, 10 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f6720836/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
index 5648ad6..4896a62 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
@@ -230,7 +230,7 @@ case class RegExpReplace(subject: Expression, regexp: 
Expression, rep: Expressio
   @transient private var lastReplacement: String = _
   @transient private var lastReplacementInUTF8: UTF8String = _
   // result buffer write by Matcher
-  @transient private val result: StringBuffer = new StringBuffer
+  @transient private lazy val result: StringBuffer = new StringBuffer
 
   override def nullSafeEval(s: Any, p: Any, r: Any): Any = {
 if (!p.equals(lastRegex)) {

http://git-wip-us.apache.org/repos/asf/spark/blob/f6720836/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
index 9ceb709..f836504 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
@@ -22,7 +22,8 @@ import org.scalactic.TripleEqualsSupport.Spread
 import org.scalatest.exceptions.TestFailedException
 import org.scalatest.prop.GeneratorDrivenPropertyChecks
 
-import org.apache.spark.SparkFunSuite
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.serializer.JavaSerializer
 import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
 import org.apache.spark.sql.catalyst.expressions.codegen._
 import org.apache.spark.sql.catalyst.optimizer.SimpleTestOptimizer
@@ -43,13 +44,15 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 
   protected def checkEvaluation(
   expression: => Expression, expected: Any, inputRow: InternalRow = 
EmptyRow): Unit = {
+val serializer = new JavaSerializer(new SparkConf()).newInstance
+val expr: Expression = 
serializer.deserialize(serializer.serialize(expression))
 val catalystValue = CatalystTypeConverters.convertToCatalyst(expected)
-checkEvaluationWithoutCodegen(expression, catalystValue, inputRow)
-checkEvaluationWithGeneratedMutableProjection(expression, catalystValue, 
inputRow)
-if (GenerateUnsafeProjection.canSupport(expression.dataType)) {
-  checkEvalutionWithUnsafeProjection(expression, catalystValue, inputRow)
+checkEvaluationWithoutCodegen(expr, catalystValue, inputRow)
+checkEvaluationWithGeneratedMutableProjection(expr, catalystValue, 
inputRow)
+if (GenerateUnsafeProjection.canSupport(expr.dataType)) {
+  checkEvalutionWithUnsafeProjection(expr, catalystValue, inputRow)
 }
-

spark git commit: [SPARK-18368] Fix regexp_replace with task serialization.

2016-11-08 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 4afa39e22 -> b9192bb3f


[SPARK-18368] Fix regexp_replace with task serialization.

## What changes were proposed in this pull request?

This makes the result value both transient and lazy, so that if the 
RegExpReplace object is initialized then serialized, `result: StringBuffer` 
will be correctly initialized.

## How was this patch tested?

* Verified that this patch fixed the query that found the bug.
* Added a test case that fails without the fix.

Author: Ryan Blue 

Closes #15816 from rdblue/SPARK-18368-fix-regexp-replace.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b9192bb3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b9192bb3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b9192bb3

Branch: refs/heads/master
Commit: b9192bb3ffc319ebee7dbd15c24656795e454749
Parents: 4afa39e
Author: Ryan Blue 
Authored: Tue Nov 8 23:47:48 2016 -0800
Committer: Reynold Xin 
Committed: Tue Nov 8 23:47:48 2016 -0800

--
 .../sql/catalyst/expressions/regexpExpressions.scala |  2 +-
 .../catalyst/expressions/ExpressionEvalHelper.scala  | 15 +--
 2 files changed, 10 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b9192bb3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
index 5648ad6..4896a62 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
@@ -230,7 +230,7 @@ case class RegExpReplace(subject: Expression, regexp: 
Expression, rep: Expressio
   @transient private var lastReplacement: String = _
   @transient private var lastReplacementInUTF8: UTF8String = _
   // result buffer write by Matcher
-  @transient private val result: StringBuffer = new StringBuffer
+  @transient private lazy val result: StringBuffer = new StringBuffer
 
   override def nullSafeEval(s: Any, p: Any, r: Any): Any = {
 if (!p.equals(lastRegex)) {

http://git-wip-us.apache.org/repos/asf/spark/blob/b9192bb3/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
index 9ceb709..f836504 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
@@ -22,7 +22,8 @@ import org.scalactic.TripleEqualsSupport.Spread
 import org.scalatest.exceptions.TestFailedException
 import org.scalatest.prop.GeneratorDrivenPropertyChecks
 
-import org.apache.spark.SparkFunSuite
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.serializer.JavaSerializer
 import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
 import org.apache.spark.sql.catalyst.expressions.codegen._
 import org.apache.spark.sql.catalyst.optimizer.SimpleTestOptimizer
@@ -43,13 +44,15 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 
   protected def checkEvaluation(
   expression: => Expression, expected: Any, inputRow: InternalRow = 
EmptyRow): Unit = {
+val serializer = new JavaSerializer(new SparkConf()).newInstance
+val expr: Expression = 
serializer.deserialize(serializer.serialize(expression))
 val catalystValue = CatalystTypeConverters.convertToCatalyst(expected)
-checkEvaluationWithoutCodegen(expression, catalystValue, inputRow)
-checkEvaluationWithGeneratedMutableProjection(expression, catalystValue, 
inputRow)
-if (GenerateUnsafeProjection.canSupport(expression.dataType)) {
-  checkEvalutionWithUnsafeProjection(expression, catalystValue, inputRow)
+checkEvaluationWithoutCodegen(expr, catalystValue, inputRow)
+checkEvaluationWithGeneratedMutableProjection(expr, catalystValue, 
inputRow)
+if (GenerateUnsafeProjection.canSupport(expr.dataType)) {
+  checkEvalutionWithUnsafeProjection(expr, catalystValue, inputRow)
 }
-checkEvaluationWithOptimization(expression, catalystValue, inputRow)
+checkEvaluationWithOptimization(expr, catalystValue,

spark git commit: [SPARK-18191][CORE] Port RDD API to use commit protocol

2016-11-08 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 73feaa30e -> 9c419698f


[SPARK-18191][CORE] Port RDD API to use commit protocol

## What changes were proposed in this pull request?

This PR port RDD API to use commit protocol, the changes made here:
1. Add new internal helper class that saves an RDD using a Hadoop OutputFormat 
named `SparkNewHadoopWriter`, it's similar with `SparkHadoopWriter` but uses 
commit protocol. This class supports the newer `mapreduce` API, instead of the 
old `mapred` API which is supported by `SparkHadoopWriter`;
2. Rewrite `PairRDDFunctions.saveAsNewAPIHadoopDataset` function, so it uses 
commit protocol now.

## How was this patch tested?
Exsiting test cases.

Author: jiangxingbo 

Closes #15769 from jiangxb1987/rdd-commit.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9c419698
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9c419698
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9c419698

Branch: refs/heads/master
Commit: 9c419698fe110a805570031cac3387a51957d9d1
Parents: 73feaa3
Author: jiangxingbo 
Authored: Tue Nov 8 09:41:01 2016 -0800
Committer: Reynold Xin 
Committed: Tue Nov 8 09:41:01 2016 -0800

--
 .../org/apache/spark/SparkHadoopWriter.scala|  25 +-
 .../io/HadoopMapReduceCommitProtocol.scala  |   6 +-
 .../io/SparkHadoopMapReduceWriter.scala | 249 +++
 .../org/apache/spark/rdd/PairRDDFunctions.scala | 139 +--
 .../spark/rdd/PairRDDFunctionsSuite.scala   |  20 +-
 .../datasources/FileFormatWriter.scala  |   4 +-
 .../spark/sql/hive/hiveWriterContainers.scala   |   3 +-
 .../spark/streaming/dstream/DStream.scala   |   5 +-
 .../streaming/scheduler/JobScheduler.scala  |   5 +-
 9 files changed, 280 insertions(+), 176 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9c419698/core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala
--
diff --git a/core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala 
b/core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala
index 7f75a39..46e22b2 100644
--- a/core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala
+++ b/core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala
@@ -23,11 +23,11 @@ import java.text.SimpleDateFormat
 import java.util.{Date, Locale}
 
 import org.apache.hadoop.fs.FileSystem
-import org.apache.hadoop.fs.Path
 import org.apache.hadoop.mapred._
 import org.apache.hadoop.mapreduce.TaskType
 
 import org.apache.spark.internal.Logging
+import org.apache.spark.internal.io.SparkHadoopWriterUtils
 import org.apache.spark.mapred.SparkHadoopMapRedUtil
 import org.apache.spark.rdd.HadoopRDD
 import org.apache.spark.util.SerializableJobConf
@@ -153,29 +153,8 @@ class SparkHadoopWriter(jobConf: JobConf) extends Logging 
with Serializable {
 splitID = splitid
 attemptID = attemptid
 
-jID = new SerializableWritable[JobID](SparkHadoopWriter.createJobID(now, 
jobid))
+jID = new 
SerializableWritable[JobID](SparkHadoopWriterUtils.createJobID(now, jobid))
 taID = new SerializableWritable[TaskAttemptID](
 new TaskAttemptID(new TaskID(jID.value, TaskType.MAP, splitID), 
attemptID))
   }
 }
-
-private[spark]
-object SparkHadoopWriter {
-  def createJobID(time: Date, id: Int): JobID = {
-val formatter = new SimpleDateFormat("MMddHHmmss", Locale.US)
-val jobtrackerID = formatter.format(time)
-new JobID(jobtrackerID, id)
-  }
-
-  def createPathFromString(path: String, conf: JobConf): Path = {
-if (path == null) {
-  throw new IllegalArgumentException("Output path is null")
-}
-val outputPath = new Path(path)
-val fs = outputPath.getFileSystem(conf)
-if (fs == null) {
-  throw new IllegalArgumentException("Incorrectly formatted output path")
-}
-outputPath.makeQualified(fs.getUri, fs.getWorkingDirectory)
-  }
-}

http://git-wip-us.apache.org/repos/asf/spark/blob/9c419698/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 
b/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
index 66ccb6d..d643a32 100644
--- 
a/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
+++ 
b/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
@@ -24,7 +24,6 @@ import org.apache.hadoop.mapreduce._
 import org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
 import

[1/3] spark-website git commit: Add 1.6.3 release.

2016-11-07 Thread rxin

Repository: spark-website
Updated Branches:
  refs/heads/asf-site 24d32b75d -> b9aa4c3ee


http://git-wip-us.apache.org/repos/asf/spark-website/blob/b9aa4c3e/site/releases/spark-release-1-2-1.html
--
diff --git a/site/releases/spark-release-1-2-1.html 
b/site/releases/spark-release-1-2-1.html
index 5581c54..c9efc6a 100644
--- a/site/releases/spark-release-1-2-1.html
+++ b/site/releases/spark-release-1-2-1.html
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 1.6.3 
released
+  (Nov 07, 2016)
+
   Spark 2.0.1 
released
   (Oct 03, 2016)
 
@@ -159,9 +162,6 @@
   Spark 1.6.2 
released
   (Jun 25, 2016)
 
-  Call 
for Presentations for Spark Summit EU is Open
-  (Jun 16, 2016)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/b9aa4c3e/site/releases/spark-release-1-2-2.html
--
diff --git a/site/releases/spark-release-1-2-2.html 
b/site/releases/spark-release-1-2-2.html
index c8a859a..d76c619 100644
--- a/site/releases/spark-release-1-2-2.html
+++ b/site/releases/spark-release-1-2-2.html
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 1.6.3 
released
+  (Nov 07, 2016)
+
   Spark 2.0.1 
released
   (Oct 03, 2016)
 
@@ -159,9 +162,6 @@
   Spark 1.6.2 
released
   (Jun 25, 2016)
 
-  Call 
for Presentations for Spark Summit EU is Open
-  (Jun 16, 2016)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/b9aa4c3e/site/releases/spark-release-1-3-0.html
--
diff --git a/site/releases/spark-release-1-3-0.html 
b/site/releases/spark-release-1-3-0.html
index 382ef4d..435ed19 100644
--- a/site/releases/spark-release-1-3-0.html
+++ b/site/releases/spark-release-1-3-0.html
@@ -150,6 +150,9 @@
   Latest News
   
 
+  Spark 1.6.3 
released
+  (Nov 07, 2016)
+
   Spark 2.0.1 
released
   (Oct 03, 2016)
 
@@ -159,9 +162,6 @@
   Spark 1.6.2 
released
   (Jun 25, 2016)
 
-  Call 
for Presentations for Spark Summit EU is Open
-  (Jun 16, 2016)
-
   
   Archive
 
@@ -191,7 +191,7 @@
 To download Spark 1.3 visit the downloads 
page.
 
 Spark Core
-Spark 1.3 sees a handful of usability improvements in the core engine. The 
core API now supports https://issues.apache.org/jira/browse/SPARK-5430;>multi level aggregation 
trees to help speed up expensive reduce operations. https://issues.apache.org/jira/browse/SPARK-5063;>Improved error 
reporting has been added for certain gotcha operations. Sparks Jetty 
dependency is https://issues.apache.org/jira/browse/SPARK-3996;>now 
shaded to help avoid conflicts with user programs. Spark now supports https://issues.apache.org/jira/browse/SPARK-3883;>SSL encryption for 
some communication endpoints. Finaly, realtime https://issues.apache.org/jira/browse/SPARK-3428;>GC metrics and https://issues.apache.org/jira/browse/SPARK-4874;>record counts have 
been added to the UI.
+Spark 1.3 sees a handful of usability improvements in the core engine. The 
core API now supports https://issues.apache.org/jira/browse/SPARK-5430;>multi level aggregation 
trees to help speed up expensive reduce operations. https://issues.apache.org/jira/browse/SPARK-5063;>Improved error 
reporting has been added for certain gotcha operations. Sparks Jetty 
dependency is https://issues.apache.org/jira/browse/SPARK-3996;>now 
shaded to help avoid conflicts with user programs. Spark now supports https://issues.apache.org/jira/browse/SPARK-3883;>SSL encryption for 
some communication endpoints. Finaly, realtime https://issues.apache.org/jira/browse/SPARK-3428;>GC metrics and https://issues.apache.org/jira/browse/SPARK-4874;>record counts have 
been added to the UI. 
 
 DataFrame API
 Spark 1.3 adds a new DataFrames API 
that provides powerful and convenient operators when working with structured 
datasets. The DataFrame is an evolution of the base RDD API that includes named 
fields along with schema information. Itâs easy to construct a DataFrame from 
sources such as Hive tables, JSON data, a JDBC database, or any implementation 
of Sparkâs new data source API. Data frames will become a common interchange 
format between Spark components and when importing and exporting data to other 
systems. Data frames are supported in Python, Scala, and Java.
@@ -203,7 +203,7 @@
 In this release Spark MLlib introduces several new algorithms: latent 
Dirichlet allocation (LDA) for https://issues.apache.org/jira/browse/SPARK-1405;>topic modeling, https://issues.apache.org/jira/browse/SPARK-2309;>multinomial logistic

< 1 2 3 4 5 6 7 8 9 10 >

301 - 400 of 3893 matches

Mail list logo