spark git commit: [SPARK-18458][CORE] Fix signed integer overflow problem at an expression in RadixSort.java
Repository: spark Updated Branches: refs/heads/master 856e00420 -> d93b65524 [SPARK-18458][CORE] Fix signed integer overflow problem at an expression in RadixSort.java ## What changes were proposed in this pull request? This PR avoids that a result of an expression is negative due to signed integer overflow (e.g. 0x10?? * 8 < 0). This PR casts each operand to `long` before executing a calculation. Since the result is interpreted as long, the result of the expression is positive. ## How was this patch tested? Manually executed query82 of TPC-DS with 100TB Author: Kazuaki IshizakiCloses #15907 from kiszk/SPARK-18458. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d93b6552 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d93b6552 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d93b6552 Branch: refs/heads/master Commit: d93b6552473468df297a08c0bef9ea0bf0f5c13a Parents: 856e004 Author: Kazuaki Ishizaki Authored: Sat Nov 19 21:50:20 2016 -0800 Committer: Reynold Xin Committed: Sat Nov 19 21:50:20 2016 -0800 -- .../util/collection/unsafe/sort/RadixSort.java | 48 ++-- .../unsafe/sort/UnsafeInMemorySorter.java | 2 +- .../collection/unsafe/sort/RadixSortSuite.scala | 28 ++-- 3 files changed, 40 insertions(+), 38 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d93b6552/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java -- diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java index 4043617..3dd3184 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java @@ -17,6 +17,8 @@ package org.apache.spark.util.collection.unsafe.sort; +import com.google.common.primitives.Ints; + import org.apache.spark.unsafe.Platform; import org.apache.spark.unsafe.array.LongArray; @@ -40,14 +42,14 @@ public class RadixSort { * of always copying the data back to position zero for efficiency. */ public static int sort( - LongArray array, int numRecords, int startByteIndex, int endByteIndex, + LongArray array, long numRecords, int startByteIndex, int endByteIndex, boolean desc, boolean signed) { assert startByteIndex >= 0 : "startByteIndex (" + startByteIndex + ") should >= 0"; assert endByteIndex <= 7 : "endByteIndex (" + endByteIndex + ") should <= 7"; assert endByteIndex > startByteIndex; assert numRecords * 2 <= array.size(); -int inIndex = 0; -int outIndex = numRecords; +long inIndex = 0; +long outIndex = numRecords; if (numRecords > 0) { long[][] counts = getCounts(array, numRecords, startByteIndex, endByteIndex); for (int i = startByteIndex; i <= endByteIndex; i++) { @@ -55,13 +57,13 @@ public class RadixSort { sortAtByte( array, numRecords, counts[i], i, inIndex, outIndex, desc, signed && i == endByteIndex); - int tmp = inIndex; + long tmp = inIndex; inIndex = outIndex; outIndex = tmp; } } } -return inIndex; +return Ints.checkedCast(inIndex); } /** @@ -78,14 +80,14 @@ public class RadixSort { * @param signed whether this is a signed (two's complement) sort (only applies to last byte). */ private static void sortAtByte( - LongArray array, int numRecords, long[] counts, int byteIdx, int inIndex, int outIndex, + LongArray array, long numRecords, long[] counts, int byteIdx, long inIndex, long outIndex, boolean desc, boolean signed) { assert counts.length == 256; long[] offsets = transformCountsToOffsets( - counts, numRecords, array.getBaseOffset() + outIndex * 8, 8, desc, signed); + counts, numRecords, array.getBaseOffset() + outIndex * 8L, 8, desc, signed); Object baseObject = array.getBaseObject(); -long baseOffset = array.getBaseOffset() + inIndex * 8; -long maxOffset = baseOffset + numRecords * 8; +long baseOffset = array.getBaseOffset() + inIndex * 8L; +long maxOffset = baseOffset + numRecords * 8L; for (long offset = baseOffset; offset < maxOffset; offset += 8) { long value = Platform.getLong(baseObject, offset); int bucket = (int)((value >>> (byteIdx * 8)) & 0xff); @@ -106,13 +108,13 @@ public class RadixSort { * significant byte. If the byte does not need sorting the array will be null. */ private static long[][]
spark git commit: [SPARK-18505][SQL] Simplify AnalyzeColumnCommand
Repository: spark Updated Branches: refs/heads/master e5f5c29e0 -> 6f7ff7509 [SPARK-18505][SQL] Simplify AnalyzeColumnCommand ## What changes were proposed in this pull request? I'm spending more time at the design & code level for cost-based optimizer now, and have found a number of issues related to maintainability and compatibility that I will like to address. This is a small pull request to clean up AnalyzeColumnCommand: 1. Removed warning on duplicated columns. Warnings in log messages are useless since most users that run SQL don't see them. 2. Removed the nested updateStats function, by just inlining the function. 3. Renamed a few functions to better reflect what they do. 4. Removed the factory apply method for ColumnStatStruct. It is a bad pattern to use a apply method that returns an instantiation of a class that is not of the same type (ColumnStatStruct.apply used to return CreateNamedStruct). 5. Renamed ColumnStatStruct to just AnalyzeColumnCommand. 6. Added more documentation explaining some of the non-obvious return types and code blocks. In follow-up pull requests, I'd like to address the following: 1. Get rid of the Map[String, ColumnStat] map, since internally we should be using Attribute to reference columns, rather than strings. 2. Decouple the fields exposed by ColumnStat and internals of Spark SQL's execution path. Currently the two are coupled because ColumnStat takes in an InternalRow. 3. Correctness: Remove code path that stores statistics in the catalog using the base64 encoding of the UnsafeRow format, which is not stable across Spark versions. 4. Clearly document the data representation stored in the catalog for statistics. ## How was this patch tested? Affected test cases have been updated. Author: Reynold Xin <r...@databricks.com> Closes #15933 from rxin/SPARK-18505. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6f7ff750 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6f7ff750 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6f7ff750 Branch: refs/heads/master Commit: 6f7ff75091154fed7649ea6d79e887aad9fbde6a Parents: e5f5c29 Author: Reynold Xin <r...@databricks.com> Authored: Fri Nov 18 16:34:11 2016 -0800 Committer: Reynold Xin <r...@databricks.com> Committed: Fri Nov 18 16:34:11 2016 -0800 -- .../command/AnalyzeColumnCommand.scala | 115 +++ .../spark/sql/StatisticsColumnSuite.scala | 2 +- .../org/apache/spark/sql/StatisticsTest.scala | 7 +- .../spark/sql/hive/HiveExternalCatalog.scala| 4 +- .../spark/sql/hive/client/HiveClientImpl.scala | 2 +- 5 files changed, 74 insertions(+), 56 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6f7ff750/sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala index 6141fab..7fc57d0 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala @@ -17,8 +17,7 @@ package org.apache.spark.sql.execution.command -import scala.collection.mutable - +import org.apache.spark.internal.Logging import org.apache.spark.sql._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases @@ -44,13 +43,16 @@ case class AnalyzeColumnCommand( val tableIdentWithDB = TableIdentifier(tableIdent.table, Some(db)) val relation = EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdentWithDB)) -relation match { +// Compute total size +val (catalogTable: CatalogTable, sizeInBytes: Long) = relation match { case catalogRel: CatalogRelation => -updateStats(catalogRel.catalogTable, +// This is a Hive serde format table +(catalogRel.catalogTable, AnalyzeTableCommand.calculateTotalSize(sessionState, catalogRel.catalogTable)) case logicalRel: LogicalRelation if logicalRel.catalogTable.isDefined => -updateStats(logicalRel.catalogTable.get, +// This is a data source format table +(logicalRel.catalogTable.get, AnalyzeTableCommand.calculateTotalSize(sessionState, logicalRel.catalogTable.get)) case otherRelation => @@ -58,45 +60,45 @@ case class AnalyzeColumnCommand( s"${otherRelation.nodeName}.") } -def updateStats(catalogTable: CatalogTable, newTotalSize: Long)
spark git commit: [SPARK-18457][SQL] ORC and other columnar formats using HiveShim read all columns when doing a simple count
Repository: spark Updated Branches: refs/heads/branch-2.1 5912c19e7 -> ec622eb7e [SPARK-18457][SQL] ORC and other columnar formats using HiveShim read all columns when doing a simple count ## What changes were proposed in this pull request? When reading zero columns (e.g., count(*)) from ORC or any other format that uses HiveShim, actually set the read column list to empty for Hive to use. ## How was this patch tested? Query correctness is handled by existing unit tests. I'm happy to add more if anyone can point out some case that is not covered. Reduction in data read can be verified in the UI when built with a recent version of Hadoop say: ``` build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -Phive -DskipTests clean package ``` However the default Hadoop 2.2 that is used for unit tests does not report actual bytes read and instead just full file sizes (see FileScanRDD.scala line 80). Therefore I don't think there is a good way to add a unit test for this. I tested with the following setup using above build options ``` case class OrcData(intField: Long, stringField: String) spark.range(1,100).map(i => OrcData(i, s"part-$i")).toDF().write.format("orc").save("orc_test") sql( s"""CREATE EXTERNAL TABLE orc_test( | intField LONG, | stringField STRING |) |STORED AS ORC |LOCATION '${System.getProperty("user.dir") + "/orc_test"}' """.stripMargin) ``` ## Results query | Spark 2.0.2 | this PR ---|---|--- `sql("select count(*) from orc_test").collect`|4.4 MB|199.4 KB `sql("select intField from orc_test").collect`|743.4 KB|743.4 KB `sql("select * from orc_test").collect`|4.4 MB|4.4 MB Author: Andrew RayCloses #15898 from aray/sql-orc-no-col. (cherry picked from commit 795e9fc9213cb9941ae131aadcafddb94bde5f74) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ec622eb7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ec622eb7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ec622eb7 Branch: refs/heads/branch-2.1 Commit: ec622eb7e1ffd0775c9ca4683d1032ca8d41654a Parents: 5912c19 Author: Andrew Ray Authored: Fri Nov 18 11:19:49 2016 -0800 Committer: Reynold Xin Committed: Fri Nov 18 11:19:59 2016 -0800 -- .../org/apache/spark/sql/hive/HiveShim.scala| 6 ++--- .../spark/sql/hive/orc/OrcQuerySuite.scala | 25 +++- 2 files changed, 27 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ec622eb7/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala index 0d2a765..9e98948 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala @@ -69,13 +69,13 @@ private[hive] object HiveShim { } /* - * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty + * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null */ def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) { -if (ids != null && ids.nonEmpty) { +if (ids != null) { ColumnProjectionUtils.appendReadColumns(conf, ids.asJava) } -if (names != null && names.nonEmpty) { +if (names != null) { appendReadColumnNames(conf, names) } } http://git-wip-us.apache.org/repos/asf/spark/blob/ec622eb7/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala -- diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala index ecb5972..a628977 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala @@ -20,11 +20,13 @@ package org.apache.spark.sql.hive.orc import java.nio.charset.StandardCharsets import java.sql.Timestamp +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.hive.ql.io.orc.{OrcStruct, SparkOrcNewRecordReader} import org.scalatest.BeforeAndAfterAll import org.apache.spark.sql._ import org.apache.spark.sql.catalyst.TableIdentifier -import org.apache.spark.sql.execution.datasources.LogicalRelation +import org.apache.spark.sql.execution.datasources.{LogicalRelation, RecordReaderIterator} import
spark git commit: [SPARK-18457][SQL] ORC and other columnar formats using HiveShim read all columns when doing a simple count
Repository: spark Updated Branches: refs/heads/master 51baca221 -> 795e9fc92 [SPARK-18457][SQL] ORC and other columnar formats using HiveShim read all columns when doing a simple count ## What changes were proposed in this pull request? When reading zero columns (e.g., count(*)) from ORC or any other format that uses HiveShim, actually set the read column list to empty for Hive to use. ## How was this patch tested? Query correctness is handled by existing unit tests. I'm happy to add more if anyone can point out some case that is not covered. Reduction in data read can be verified in the UI when built with a recent version of Hadoop say: ``` build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -Phive -DskipTests clean package ``` However the default Hadoop 2.2 that is used for unit tests does not report actual bytes read and instead just full file sizes (see FileScanRDD.scala line 80). Therefore I don't think there is a good way to add a unit test for this. I tested with the following setup using above build options ``` case class OrcData(intField: Long, stringField: String) spark.range(1,100).map(i => OrcData(i, s"part-$i")).toDF().write.format("orc").save("orc_test") sql( s"""CREATE EXTERNAL TABLE orc_test( | intField LONG, | stringField STRING |) |STORED AS ORC |LOCATION '${System.getProperty("user.dir") + "/orc_test"}' """.stripMargin) ``` ## Results query | Spark 2.0.2 | this PR ---|---|--- `sql("select count(*) from orc_test").collect`|4.4 MB|199.4 KB `sql("select intField from orc_test").collect`|743.4 KB|743.4 KB `sql("select * from orc_test").collect`|4.4 MB|4.4 MB Author: Andrew RayCloses #15898 from aray/sql-orc-no-col. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/795e9fc9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/795e9fc9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/795e9fc9 Branch: refs/heads/master Commit: 795e9fc9213cb9941ae131aadcafddb94bde5f74 Parents: 51baca2 Author: Andrew Ray Authored: Fri Nov 18 11:19:49 2016 -0800 Committer: Reynold Xin Committed: Fri Nov 18 11:19:49 2016 -0800 -- .../org/apache/spark/sql/hive/HiveShim.scala| 6 ++--- .../spark/sql/hive/orc/OrcQuerySuite.scala | 25 +++- 2 files changed, 27 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/795e9fc9/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala index 0d2a765..9e98948 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala @@ -69,13 +69,13 @@ private[hive] object HiveShim { } /* - * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty + * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null */ def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) { -if (ids != null && ids.nonEmpty) { +if (ids != null) { ColumnProjectionUtils.appendReadColumns(conf, ids.asJava) } -if (names != null && names.nonEmpty) { +if (names != null) { appendReadColumnNames(conf, names) } } http://git-wip-us.apache.org/repos/asf/spark/blob/795e9fc9/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala -- diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala index ecb5972..a628977 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala @@ -20,11 +20,13 @@ package org.apache.spark.sql.hive.orc import java.nio.charset.StandardCharsets import java.sql.Timestamp +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.hive.ql.io.orc.{OrcStruct, SparkOrcNewRecordReader} import org.scalatest.BeforeAndAfterAll import org.apache.spark.sql._ import org.apache.spark.sql.catalyst.TableIdentifier -import org.apache.spark.sql.execution.datasources.LogicalRelation +import org.apache.spark.sql.execution.datasources.{LogicalRelation, RecordReaderIterator} import org.apache.spark.sql.hive.{HiveUtils, MetastoreRelation} import org.apache.spark.sql.hive.test.TestHive._ import
spark git commit: [SPARK-18462] Fix ClassCastException in SparkListenerDriverAccumUpdates event
Repository: spark Updated Branches: refs/heads/branch-2.1 fc466be4f -> e8b1955e2 [SPARK-18462] Fix ClassCastException in SparkListenerDriverAccumUpdates event ## What changes were proposed in this pull request? This patch fixes a `ClassCastException: java.lang.Integer cannot be cast to java.lang.Long` error which could occur in the HistoryServer while trying to process a deserialized `SparkListenerDriverAccumUpdates` event. The problem stems from how `jackson-module-scala` handles primitive type parameters (see https://github.com/FasterXML/jackson-module-scala/wiki/FAQ#deserializing-optionint-and-other-primitive-challenges for more details). This was causing a problem where our code expected a field to be deserialized as a `(Long, Long)` tuple but we got an `(Int, Int)` tuple instead. This patch hacks around this issue by registering a custom `Converter` with Jackson in order to deserialize the tuples as `(Object, Object)` and perform the appropriate casting. ## How was this patch tested? New regression tests in `SQLListenerSuite`. Author: Josh RosenCloses #15922 from JoshRosen/SPARK-18462. (cherry picked from commit d9dd979d170f44383a9a87f892f2486ddb3cca7d) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e8b1955e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e8b1955e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e8b1955e Branch: refs/heads/branch-2.1 Commit: e8b1955e20a966da9a95f75320680cbab1096540 Parents: fc466be Author: Josh Rosen Authored: Thu Nov 17 18:45:15 2016 -0800 Committer: Reynold Xin Committed: Thu Nov 17 18:45:22 2016 -0800 -- .../spark/sql/execution/ui/SQLListener.scala| 39 - .../sql/execution/ui/SQLListenerSuite.scala | 44 +++- 2 files changed, 80 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e8b1955e/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala index 60f1343..5daf215 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala @@ -19,6 +19,11 @@ package org.apache.spark.sql.execution.ui import scala.collection.mutable +import com.fasterxml.jackson.databind.JavaType +import com.fasterxml.jackson.databind.`type`.TypeFactory +import com.fasterxml.jackson.databind.annotation.JsonDeserialize +import com.fasterxml.jackson.databind.util.Converter + import org.apache.spark.{JobExecutionStatus, SparkConf} import org.apache.spark.annotation.DeveloperApi import org.apache.spark.internal.Logging @@ -43,9 +48,41 @@ case class SparkListenerSQLExecutionEnd(executionId: Long, time: Long) extends SparkListenerEvent @DeveloperApi -case class SparkListenerDriverAccumUpdates(executionId: Long, accumUpdates: Seq[(Long, Long)]) +case class SparkListenerDriverAccumUpdates( +executionId: Long, +@JsonDeserialize(contentConverter = classOf[LongLongTupleConverter]) +accumUpdates: Seq[(Long, Long)]) extends SparkListenerEvent +/** + * Jackson [[Converter]] for converting an (Int, Int) tuple into a (Long, Long) tuple. + * + * This is necessary due to limitations in how Jackson's scala module deserializes primitives; + * see the "Deserializing Option[Int] and other primitive challenges" section in + * https://github.com/FasterXML/jackson-module-scala/wiki/FAQ for a discussion of this issue and + * SPARK-18462 for the specific problem that motivated this conversion. + */ +private class LongLongTupleConverter extends Converter[(Object, Object), (Long, Long)] { + + override def convert(in: (Object, Object)): (Long, Long) = { +def toLong(a: Object): Long = a match { + case i: java.lang.Integer => i.intValue() + case l: java.lang.Long => l.longValue() +} +(toLong(in._1), toLong(in._2)) + } + + override def getInputType(typeFactory: TypeFactory): JavaType = { +val objectType = typeFactory.uncheckedSimpleType(classOf[Object]) +typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], Array(objectType, objectType)) + } + + override def getOutputType(typeFactory: TypeFactory): JavaType = { +val longType = typeFactory.uncheckedSimpleType(classOf[Long]) +typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], Array(longType, longType)) + } +} + class SQLHistoryListenerFactory extends
spark git commit: [SPARK-18462] Fix ClassCastException in SparkListenerDriverAccumUpdates event
Repository: spark Updated Branches: refs/heads/master ce13c2672 -> d9dd979d1 [SPARK-18462] Fix ClassCastException in SparkListenerDriverAccumUpdates event ## What changes were proposed in this pull request? This patch fixes a `ClassCastException: java.lang.Integer cannot be cast to java.lang.Long` error which could occur in the HistoryServer while trying to process a deserialized `SparkListenerDriverAccumUpdates` event. The problem stems from how `jackson-module-scala` handles primitive type parameters (see https://github.com/FasterXML/jackson-module-scala/wiki/FAQ#deserializing-optionint-and-other-primitive-challenges for more details). This was causing a problem where our code expected a field to be deserialized as a `(Long, Long)` tuple but we got an `(Int, Int)` tuple instead. This patch hacks around this issue by registering a custom `Converter` with Jackson in order to deserialize the tuples as `(Object, Object)` and perform the appropriate casting. ## How was this patch tested? New regression tests in `SQLListenerSuite`. Author: Josh RosenCloses #15922 from JoshRosen/SPARK-18462. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d9dd979d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d9dd979d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d9dd979d Branch: refs/heads/master Commit: d9dd979d170f44383a9a87f892f2486ddb3cca7d Parents: ce13c26 Author: Josh Rosen Authored: Thu Nov 17 18:45:15 2016 -0800 Committer: Reynold Xin Committed: Thu Nov 17 18:45:15 2016 -0800 -- .../spark/sql/execution/ui/SQLListener.scala| 39 - .../sql/execution/ui/SQLListenerSuite.scala | 44 +++- 2 files changed, 80 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d9dd979d/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala index 60f1343..5daf215 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala @@ -19,6 +19,11 @@ package org.apache.spark.sql.execution.ui import scala.collection.mutable +import com.fasterxml.jackson.databind.JavaType +import com.fasterxml.jackson.databind.`type`.TypeFactory +import com.fasterxml.jackson.databind.annotation.JsonDeserialize +import com.fasterxml.jackson.databind.util.Converter + import org.apache.spark.{JobExecutionStatus, SparkConf} import org.apache.spark.annotation.DeveloperApi import org.apache.spark.internal.Logging @@ -43,9 +48,41 @@ case class SparkListenerSQLExecutionEnd(executionId: Long, time: Long) extends SparkListenerEvent @DeveloperApi -case class SparkListenerDriverAccumUpdates(executionId: Long, accumUpdates: Seq[(Long, Long)]) +case class SparkListenerDriverAccumUpdates( +executionId: Long, +@JsonDeserialize(contentConverter = classOf[LongLongTupleConverter]) +accumUpdates: Seq[(Long, Long)]) extends SparkListenerEvent +/** + * Jackson [[Converter]] for converting an (Int, Int) tuple into a (Long, Long) tuple. + * + * This is necessary due to limitations in how Jackson's scala module deserializes primitives; + * see the "Deserializing Option[Int] and other primitive challenges" section in + * https://github.com/FasterXML/jackson-module-scala/wiki/FAQ for a discussion of this issue and + * SPARK-18462 for the specific problem that motivated this conversion. + */ +private class LongLongTupleConverter extends Converter[(Object, Object), (Long, Long)] { + + override def convert(in: (Object, Object)): (Long, Long) = { +def toLong(a: Object): Long = a match { + case i: java.lang.Integer => i.intValue() + case l: java.lang.Long => l.longValue() +} +(toLong(in._1), toLong(in._2)) + } + + override def getInputType(typeFactory: TypeFactory): JavaType = { +val objectType = typeFactory.uncheckedSimpleType(classOf[Object]) +typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], Array(objectType, objectType)) + } + + override def getOutputType(typeFactory: TypeFactory): JavaType = { +val longType = typeFactory.uncheckedSimpleType(classOf[Long]) +typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], Array(longType, longType)) + } +} + class SQLHistoryListenerFactory extends SparkHistoryListenerFactory { override def createListeners(conf: SparkConf, sparkUI: SparkUI): Seq[SparkListener] = {
spark git commit: [SPARK-18464][SQL] support old table which doesn't store schema in metastore
Repository: spark Updated Branches: refs/heads/branch-2.1 6a3cbbc03 -> 014fceee0 [SPARK-18464][SQL] support old table which doesn't store schema in metastore ## What changes were proposed in this pull request? Before Spark 2.1, users can create an external data source table without schema, and we will infer the table schema at runtime. In Spark 2.1, we decided to infer the schema when the table was created, so that we don't need to infer it again and again at runtime. This is a good improvement, but we should still respect and support old tables which doesn't store table schema in metastore. ## How was this patch tested? regression test. Author: Wenchen FanCloses #15900 from cloud-fan/hive-catalog. (cherry picked from commit 07b3f045cd6f79b92bc86b3b1b51d3d5e6bd37ce) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/014fceee Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/014fceee Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/014fceee Branch: refs/heads/branch-2.1 Commit: 014fceee04c69d7944c74b3794e821e4d1003dd0 Parents: 6a3cbbc Author: Wenchen Fan Authored: Thu Nov 17 00:00:38 2016 -0800 Committer: Reynold Xin Committed: Thu Nov 17 00:00:47 2016 -0800 -- .../spark/sql/execution/command/tables.scala| 8 ++- .../spark/sql/hive/HiveExternalCatalog.scala| 5 + .../spark/sql/hive/HiveMetastoreCatalog.scala | 4 +++- .../sql/hive/MetastoreDataSourcesSuite.scala| 22 4 files changed, 37 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/014fceee/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala index 119e732..7049e53 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala @@ -431,7 +431,13 @@ case class DescribeTableCommand( describeSchema(catalog.lookupRelation(table).schema, result) } else { val metadata = catalog.getTableMetadata(table) - describeSchema(metadata.schema, result) + if (metadata.schema.isEmpty) { +// In older version(prior to 2.1) of Spark, the table schema can be empty and should be +// inferred at runtime. We should still support it. +describeSchema(catalog.lookupRelation(metadata.identifier).schema, result) + } else { +describeSchema(metadata.schema, result) + } describePartitionInfo(metadata, result) http://git-wip-us.apache.org/repos/asf/spark/blob/014fceee/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala index cbd00da..8433058 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala @@ -1023,6 +1023,11 @@ object HiveExternalCatalog { // After SPARK-6024, we removed this flag. // Although we are not using `spark.sql.sources.schema` any more, we need to still support. DataType.fromJson(schema.get).asInstanceOf[StructType] +} else if (props.filterKeys(_.startsWith(DATASOURCE_SCHEMA_PREFIX)).isEmpty) { + // If there is no schema information in table properties, it means the schema of this table + // was empty when saving into metastore, which is possible in older version(prior to 2.1) of + // Spark. We should respect it. + new StructType() } else { val numSchemaParts = props.get(DATASOURCE_SCHEMA_NUMPARTS) if (numSchemaParts.isDefined) { http://git-wip-us.apache.org/repos/asf/spark/blob/014fceee/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala index 8e5fc88..edbde5d 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala @@ -64,7 +64,9 @@ private[hive] class
spark git commit: [SPARK-18464][SQL] support old table which doesn't store schema in metastore
Repository: spark Updated Branches: refs/heads/master 170eeb345 -> 07b3f045c [SPARK-18464][SQL] support old table which doesn't store schema in metastore ## What changes were proposed in this pull request? Before Spark 2.1, users can create an external data source table without schema, and we will infer the table schema at runtime. In Spark 2.1, we decided to infer the schema when the table was created, so that we don't need to infer it again and again at runtime. This is a good improvement, but we should still respect and support old tables which doesn't store table schema in metastore. ## How was this patch tested? regression test. Author: Wenchen FanCloses #15900 from cloud-fan/hive-catalog. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/07b3f045 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/07b3f045 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/07b3f045 Branch: refs/heads/master Commit: 07b3f045cd6f79b92bc86b3b1b51d3d5e6bd37ce Parents: 170eeb3 Author: Wenchen Fan Authored: Thu Nov 17 00:00:38 2016 -0800 Committer: Reynold Xin Committed: Thu Nov 17 00:00:38 2016 -0800 -- .../spark/sql/execution/command/tables.scala| 8 ++- .../spark/sql/hive/HiveExternalCatalog.scala| 5 + .../spark/sql/hive/HiveMetastoreCatalog.scala | 4 +++- .../sql/hive/MetastoreDataSourcesSuite.scala| 22 4 files changed, 37 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/07b3f045/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala index 119e732..7049e53 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala @@ -431,7 +431,13 @@ case class DescribeTableCommand( describeSchema(catalog.lookupRelation(table).schema, result) } else { val metadata = catalog.getTableMetadata(table) - describeSchema(metadata.schema, result) + if (metadata.schema.isEmpty) { +// In older version(prior to 2.1) of Spark, the table schema can be empty and should be +// inferred at runtime. We should still support it. +describeSchema(catalog.lookupRelation(metadata.identifier).schema, result) + } else { +describeSchema(metadata.schema, result) + } describePartitionInfo(metadata, result) http://git-wip-us.apache.org/repos/asf/spark/blob/07b3f045/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala index cbd00da..8433058 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala @@ -1023,6 +1023,11 @@ object HiveExternalCatalog { // After SPARK-6024, we removed this flag. // Although we are not using `spark.sql.sources.schema` any more, we need to still support. DataType.fromJson(schema.get).asInstanceOf[StructType] +} else if (props.filterKeys(_.startsWith(DATASOURCE_SCHEMA_PREFIX)).isEmpty) { + // If there is no schema information in table properties, it means the schema of this table + // was empty when saving into metastore, which is possible in older version(prior to 2.1) of + // Spark. We should respect it. + new StructType() } else { val numSchemaParts = props.get(DATASOURCE_SCHEMA_NUMPARTS) if (numSchemaParts.isDefined) { http://git-wip-us.apache.org/repos/asf/spark/blob/07b3f045/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala index 8e5fc88..edbde5d 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala @@ -64,7 +64,9 @@ private[hive] class HiveMetastoreCatalog(sparkSession: SparkSession) extends Log val dataSource = DataSource( sparkSession, -
spark git commit: [YARN][DOC] Increasing NodeManager's heap size with External Shuffle Service
Repository: spark Updated Branches: refs/heads/branch-2.1 3d4756d56 -> 523abfe19 [YARN][DOC] Increasing NodeManager's heap size with External Shuffle Service ## What changes were proposed in this pull request? Suggest users to increase `NodeManager's` heap size if `External Shuffle Service` is enabled as `NM` can spend a lot of time doing GC resulting in shuffle operations being a bottleneck due to `Shuffle Read blocked time` bumped up. Also because of GC `NodeManager` can use an enormous amount of CPU and cluster performance will suffer. I have seen NodeManager using 5-13G RAM and up to 2700% CPU with `spark_shuffle` service on. ## How was this patch tested? Added step 5: ![shuffle_service](https://cloud.githubusercontent.com/assets/15244468/20355499/2fec0fde-ac2a-11e6-8f8b-1c80daf71be1.png) Author: Artur SukhenkoCloses #15906 from Devian-ua/nmHeapSize. (cherry picked from commit 55589987be89ff78dadf44498352fbbd811a206e) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/523abfe1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/523abfe1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/523abfe1 Branch: refs/heads/branch-2.1 Commit: 523abfe19caa11747133877b0c8319c68ac66e56 Parents: 3d4756d Author: Artur Sukhenko Authored: Wed Nov 16 15:08:01 2016 -0800 Committer: Reynold Xin Committed: Wed Nov 16 15:08:10 2016 -0800 -- docs/running-on-yarn.md | 2 ++ 1 file changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/523abfe1/docs/running-on-yarn.md -- diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index cd18808..fe0221c 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -559,6 +559,8 @@ pre-packaged distribution. 1. In the `yarn-site.xml` on each node, add `spark_shuffle` to `yarn.nodemanager.aux-services`, then set `yarn.nodemanager.aux-services.spark_shuffle.class` to `org.apache.spark.network.yarn.YarnShuffleService`. +1. Increase `NodeManager's` heap size by setting `YARN_HEAPSIZE` (1000 by default) in `etc/hadoop/yarn-env.sh` +to avoid garbage collection issues during shuffle. 1. Restart all `NodeManager`s in your cluster. The following extra configuration options are available when the shuffle service is running on YARN: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [YARN][DOC] Increasing NodeManager's heap size with External Shuffle Service
Repository: spark Updated Branches: refs/heads/master 2ca8ae9aa -> 55589987b [YARN][DOC] Increasing NodeManager's heap size with External Shuffle Service ## What changes were proposed in this pull request? Suggest users to increase `NodeManager's` heap size if `External Shuffle Service` is enabled as `NM` can spend a lot of time doing GC resulting in shuffle operations being a bottleneck due to `Shuffle Read blocked time` bumped up. Also because of GC `NodeManager` can use an enormous amount of CPU and cluster performance will suffer. I have seen NodeManager using 5-13G RAM and up to 2700% CPU with `spark_shuffle` service on. ## How was this patch tested? Added step 5: ![shuffle_service](https://cloud.githubusercontent.com/assets/15244468/20355499/2fec0fde-ac2a-11e6-8f8b-1c80daf71be1.png) Author: Artur SukhenkoCloses #15906 from Devian-ua/nmHeapSize. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/55589987 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/55589987 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/55589987 Branch: refs/heads/master Commit: 55589987be89ff78dadf44498352fbbd811a206e Parents: 2ca8ae9 Author: Artur Sukhenko Authored: Wed Nov 16 15:08:01 2016 -0800 Committer: Reynold Xin Committed: Wed Nov 16 15:08:01 2016 -0800 -- docs/running-on-yarn.md | 2 ++ 1 file changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/55589987/docs/running-on-yarn.md -- diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index cd18808..fe0221c 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -559,6 +559,8 @@ pre-packaged distribution. 1. In the `yarn-site.xml` on each node, add `spark_shuffle` to `yarn.nodemanager.aux-services`, then set `yarn.nodemanager.aux-services.spark_shuffle.class` to `org.apache.spark.network.yarn.YarnShuffleService`. +1. Increase `NodeManager's` heap size by setting `YARN_HEAPSIZE` (1000 by default) in `etc/hadoop/yarn-env.sh` +to avoid garbage collection issues during shuffle. 1. Restart all `NodeManager`s in your cluster. The following extra configuration options are available when the shuffle service is running on YARN: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[3/3] spark-website git commit: Add CloudSort news entry.
Add CloudSort news entry. Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/8781cd3c Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/8781cd3c Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/8781cd3c Branch: refs/heads/asf-site Commit: 8781cd3c4b6e58c131b62ee251be50dec6939106 Parents: c693f2a Author: Reynold XinAuthored: Tue Nov 15 22:32:03 2016 -0800 Committer: Reynold Xin Committed: Tue Nov 15 22:32:03 2016 -0800 -- ...1-15-spark-wins-cloudsort-100tb-benchmark.md | 22 ++ site/community.html | 6 +- site/documentation.html | 11 +- site/downloads.html | 6 +- site/examples.html | 6 +- site/faq.html | 6 +- site/graphx/index.html | 6 +- site/index.html | 6 +- site/mailing-lists.html | 8 +- site/mllib/index.html | 6 +- site/news/amp-camp-2013-registration-ope.html | 6 +- .../news/announcing-the-first-spark-summit.html | 6 +- .../news/fourth-spark-screencast-published.html | 6 +- site/news/index.html| 26 ++- site/news/nsdi-paper.html | 6 +- site/news/one-month-to-spark-summit-2015.html | 6 +- .../proposals-open-for-spark-summit-east.html | 6 +- ...registration-open-for-spark-summit-east.html | 6 +- .../news/run-spark-and-shark-on-amazon-emr.html | 6 +- site/news/spark-0-6-1-and-0-5-2-released.html | 6 +- site/news/spark-0-6-2-released.html | 6 +- site/news/spark-0-7-0-released.html | 6 +- site/news/spark-0-7-2-released.html | 6 +- site/news/spark-0-7-3-released.html | 6 +- site/news/spark-0-8-0-released.html | 6 +- site/news/spark-0-8-1-released.html | 6 +- site/news/spark-0-9-0-released.html | 6 +- site/news/spark-0-9-1-released.html | 8 +- site/news/spark-0-9-2-released.html | 8 +- site/news/spark-1-0-0-released.html | 6 +- site/news/spark-1-0-1-released.html | 6 +- site/news/spark-1-0-2-released.html | 6 +- site/news/spark-1-1-0-released.html | 8 +- site/news/spark-1-1-1-released.html | 6 +- site/news/spark-1-2-0-released.html | 6 +- site/news/spark-1-2-1-released.html | 6 +- site/news/spark-1-2-2-released.html | 8 +- site/news/spark-1-3-0-released.html | 6 +- site/news/spark-1-4-0-released.html | 6 +- site/news/spark-1-4-1-released.html | 6 +- site/news/spark-1-5-0-released.html | 6 +- site/news/spark-1-5-1-released.html | 6 +- site/news/spark-1-5-2-released.html | 6 +- site/news/spark-1-6-0-released.html | 6 +- site/news/spark-1-6-1-released.html | 6 +- site/news/spark-1-6-2-released.html | 6 +- site/news/spark-1-6-3-released.html | 6 +- site/news/spark-2-0-0-released.html | 6 +- site/news/spark-2-0-1-released.html | 6 +- site/news/spark-2-0-2-released.html | 6 +- site/news/spark-2.0.0-preview.html | 6 +- .../spark-accepted-into-apache-incubator.html | 6 +- site/news/spark-and-shark-in-the-news.html | 8 +- site/news/spark-becomes-tlp.html| 6 +- site/news/spark-featured-in-wired.html | 6 +- .../spark-mailing-lists-moving-to-apache.html | 6 +- site/news/spark-meetups.html| 6 +- site/news/spark-screencasts-published.html | 6 +- site/news/spark-summit-2013-is-a-wrap.html | 6 +- site/news/spark-summit-2014-videos-posted.html | 6 +- site/news/spark-summit-2015-videos-posted.html | 6 +- site/news/spark-summit-agenda-posted.html | 6 +- .../spark-summit-east-2015-videos-posted.html | 8 +- .../spark-summit-east-2016-cfp-closing.html | 6 +- site/news/spark-summit-east-agenda-posted.html | 6 +- .../news/spark-summit-europe-agenda-posted.html | 6 +- site/news/spark-summit-europe.html | 6 +- .../spark-summit-june-2016-agenda-posted.html | 6 +- site/news/spark-tips-from-quantifind.html | 6 +- .../spark-user-survey-and-powered-by-page.html | 6 +- site/news/spark-version-0-6-0-released.html | 6 +- .../spark-wins-cloudsort-100tb-benchmark.html | 218 +++ ...-wins-daytona-gray-sort-100tb-benchmark.html | 6 +- .../strata-exercises-now-available-online.html | 6 +-
[2/3] spark-website git commit: Add CloudSort news entry.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/news/spark-mailing-lists-moving-to-apache.html -- diff --git a/site/news/spark-mailing-lists-moving-to-apache.html b/site/news/spark-mailing-lists-moving-to-apache.html index 2c10518..45d067b 100644 --- a/site/news/spark-mailing-lists-moving-to-apache.html +++ b/site/news/spark-mailing-lists-moving-to-apache.html @@ -150,6 +150,9 @@ Latest News + Spark wins CloudSort Benchmark as the most efficient engine + (Nov 15, 2016) + Spark 2.0.2 released (Nov 14, 2016) @@ -159,9 +162,6 @@ Spark 2.0.1 released (Oct 03, 2016) - Spark 2.0.0 released - (Jul 26, 2016) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/news/spark-meetups.html -- diff --git a/site/news/spark-meetups.html b/site/news/spark-meetups.html index 5dc78fa..5e2eadc 100644 --- a/site/news/spark-meetups.html +++ b/site/news/spark-meetups.html @@ -150,6 +150,9 @@ Latest News + Spark wins CloudSort Benchmark as the most efficient engine + (Nov 15, 2016) + Spark 2.0.2 released (Nov 14, 2016) @@ -159,9 +162,6 @@ Spark 2.0.1 released (Oct 03, 2016) - Spark 2.0.0 released - (Jul 26, 2016) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/news/spark-screencasts-published.html -- diff --git a/site/news/spark-screencasts-published.html b/site/news/spark-screencasts-published.html index 829ce81..5b57d16 100644 --- a/site/news/spark-screencasts-published.html +++ b/site/news/spark-screencasts-published.html @@ -150,6 +150,9 @@ Latest News + Spark wins CloudSort Benchmark as the most efficient engine + (Nov 15, 2016) + Spark 2.0.2 released (Nov 14, 2016) @@ -159,9 +162,6 @@ Spark 2.0.1 released (Oct 03, 2016) - Spark 2.0.0 released - (Jul 26, 2016) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/news/spark-summit-2013-is-a-wrap.html -- diff --git a/site/news/spark-summit-2013-is-a-wrap.html b/site/news/spark-summit-2013-is-a-wrap.html index d068281..ba84c36 100644 --- a/site/news/spark-summit-2013-is-a-wrap.html +++ b/site/news/spark-summit-2013-is-a-wrap.html @@ -150,6 +150,9 @@ Latest News + Spark wins CloudSort Benchmark as the most efficient engine + (Nov 15, 2016) + Spark 2.0.2 released (Nov 14, 2016) @@ -159,9 +162,6 @@ Spark 2.0.1 released (Oct 03, 2016) - Spark 2.0.0 released - (Jul 26, 2016) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/news/spark-summit-2014-videos-posted.html -- diff --git a/site/news/spark-summit-2014-videos-posted.html b/site/news/spark-summit-2014-videos-posted.html index 4b6133f..dffeacd 100644 --- a/site/news/spark-summit-2014-videos-posted.html +++ b/site/news/spark-summit-2014-videos-posted.html @@ -150,6 +150,9 @@ Latest News + Spark wins CloudSort Benchmark as the most efficient engine + (Nov 15, 2016) + Spark 2.0.2 released (Nov 14, 2016) @@ -159,9 +162,6 @@ Spark 2.0.1 released (Oct 03, 2016) - Spark 2.0.0 released - (Jul 26, 2016) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/news/spark-summit-2015-videos-posted.html -- diff --git a/site/news/spark-summit-2015-videos-posted.html b/site/news/spark-summit-2015-videos-posted.html index f211d33..32aecea 100644 --- a/site/news/spark-summit-2015-videos-posted.html +++ b/site/news/spark-summit-2015-videos-posted.html @@ -150,6 +150,9 @@ Latest News + Spark wins CloudSort Benchmark as the most efficient engine + (Nov 15, 2016) + Spark 2.0.2 released (Nov 14, 2016) @@ -159,9 +162,6 @@ Spark 2.0.1 released (Oct 03, 2016) - Spark 2.0.0 released - (Jul 26, 2016) -
[1/3] spark-website git commit: Add CloudSort news entry.
Repository: spark-website Updated Branches: refs/heads/asf-site c693f2a7d -> 8781cd3c4 http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/releases/spark-release-1-2-1.html -- diff --git a/site/releases/spark-release-1-2-1.html b/site/releases/spark-release-1-2-1.html index f2a8c60..22e3a1e 100644 --- a/site/releases/spark-release-1-2-1.html +++ b/site/releases/spark-release-1-2-1.html @@ -150,6 +150,9 @@ Latest News + Spark wins CloudSort Benchmark as the most efficient engine + (Nov 15, 2016) + Spark 2.0.2 released (Nov 14, 2016) @@ -159,9 +162,6 @@ Spark 2.0.1 released (Oct 03, 2016) - Spark 2.0.0 released - (Jul 26, 2016) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/releases/spark-release-1-2-2.html -- diff --git a/site/releases/spark-release-1-2-2.html b/site/releases/spark-release-1-2-2.html index 2fc7a38..c70ceee 100644 --- a/site/releases/spark-release-1-2-2.html +++ b/site/releases/spark-release-1-2-2.html @@ -150,6 +150,9 @@ Latest News + Spark wins CloudSort Benchmark as the most efficient engine + (Nov 15, 2016) + Spark 2.0.2 released (Nov 14, 2016) @@ -159,9 +162,6 @@ Spark 2.0.1 released (Oct 03, 2016) - Spark 2.0.0 released - (Jul 26, 2016) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/8781cd3c/site/releases/spark-release-1-3-0.html -- diff --git a/site/releases/spark-release-1-3-0.html b/site/releases/spark-release-1-3-0.html index 5bf1840..9e47334 100644 --- a/site/releases/spark-release-1-3-0.html +++ b/site/releases/spark-release-1-3-0.html @@ -150,6 +150,9 @@ Latest News + Spark wins CloudSort Benchmark as the most efficient engine + (Nov 15, 2016) + Spark 2.0.2 released (Nov 14, 2016) @@ -159,9 +162,6 @@ Spark 2.0.1 released (Oct 03, 2016) - Spark 2.0.0 released - (Jul 26, 2016) - Archive @@ -191,7 +191,7 @@ To download Spark 1.3 visit the downloads page. Spark Core -Spark 1.3 sees a handful of usability improvements in the core engine. The core API now supports https://issues.apache.org/jira/browse/SPARK-5430;>multi level aggregation trees to help speed up expensive reduce operations. https://issues.apache.org/jira/browse/SPARK-5063;>Improved error reporting has been added for certain gotcha operations. Sparks Jetty dependency is https://issues.apache.org/jira/browse/SPARK-3996;>now shaded to help avoid conflicts with user programs. Spark now supports https://issues.apache.org/jira/browse/SPARK-3883;>SSL encryption for some communication endpoints. Finaly, realtime https://issues.apache.org/jira/browse/SPARK-3428;>GC metrics and https://issues.apache.org/jira/browse/SPARK-4874;>record counts have been added to the UI. +Spark 1.3 sees a handful of usability improvements in the core engine. The core API now supports https://issues.apache.org/jira/browse/SPARK-5430;>multi level aggregation trees to help speed up expensive reduce operations. https://issues.apache.org/jira/browse/SPARK-5063;>Improved error reporting has been added for certain gotcha operations. Sparks Jetty dependency is https://issues.apache.org/jira/browse/SPARK-3996;>now shaded to help avoid conflicts with user programs. Spark now supports https://issues.apache.org/jira/browse/SPARK-3883;>SSL encryption for some communication endpoints. Finaly, realtime https://issues.apache.org/jira/browse/SPARK-3428;>GC metrics and https://issues.apache.org/jira/browse/SPARK-4874;>record counts have been added to the UI. DataFrame API Spark 1.3 adds a new DataFrames API that provides powerful and convenient operators when working with structured datasets. The DataFrame is an evolution of the base RDD API that includes named fields along with schema information. Itâs easy to construct a DataFrame from sources such as Hive tables, JSON data, a JDBC database, or any implementation of Sparkâs new data source API. Data frames will become a common interchange format between Spark components and when importing and exporting data to other systems. Data frames are supported in Python, Scala, and Java. @@ -203,7 +203,7 @@ In this release Spark MLlib introduces several new algorithms: latent Dirichlet allocation (LDA) for https://issues.apache.org/jira/browse/SPARK-1405;>topic modeling,
spark git commit: [SPARK-18377][SQL] warehouse path should be a static conf
Repository: spark Updated Branches: refs/heads/master 4b35d13ba -> 4ac9759f8 [SPARK-18377][SQL] warehouse path should be a static conf ## What changes were proposed in this pull request? it's weird that every session can set its own warehouse path at runtime, we should forbid it and make it a static conf. ## How was this patch tested? existing tests. Author: Wenchen FanCloses #15825 from cloud-fan/warehouse. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4ac9759f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4ac9759f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4ac9759f Branch: refs/heads/master Commit: 4ac9759f807d217b6f67badc6d5f6b7138eb92d2 Parents: 4b35d13 Author: Wenchen Fan Authored: Tue Nov 15 20:24:36 2016 -0800 Committer: Reynold Xin Committed: Tue Nov 15 20:24:36 2016 -0800 -- .../sql/catalyst/catalog/SessionCatalog.scala | 9 +- .../org/apache/spark/sql/internal/SQLConf.scala | 12 +- .../apache/spark/sql/internal/SharedState.scala | 32 +-- .../spark/sql/execution/command/DDLSuite.scala | 193 +++ .../spark/sql/internal/SQLConfSuite.scala | 16 +- .../org/apache/spark/sql/hive/HiveUtils.scala | 4 +- .../spark/sql/hive/execution/HiveDDLSuite.scala | 85 7 files changed, 142 insertions(+), 209 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4ac9759f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala index c8b61d8..19a8fcd 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala @@ -83,14 +83,7 @@ class SessionCatalog( // check whether the temporary table or function exists, then, if not, operate on // the corresponding item in the current database. @GuardedBy("this") - protected var currentDb = { -val defaultName = DEFAULT_DATABASE -val defaultDbDefinition = - CatalogDatabase(defaultName, "default database", conf.warehousePath, Map()) -// Initialize default database if it doesn't already exist -createDatabase(defaultDbDefinition, ignoreIfExists = true) -formatDatabaseName(defaultName) - } + protected var currentDb = formatDatabaseName(DEFAULT_DATABASE) /** * Format table name, taking into account case sensitivity. http://git-wip-us.apache.org/repos/asf/spark/blob/4ac9759f/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 6372936..b2a50c6 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -56,11 +56,6 @@ object SQLConf { } - val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir") -.doc("The default location for managed databases and tables.") -.stringConf -.createWithDefault(Utils.resolveURI("spark-warehouse").toString) - val OPTIMIZER_MAX_ITERATIONS = SQLConfigBuilder("spark.sql.optimizer.maxIterations") .internal() .doc("The max number of iterations the optimizer and analyzer runs.") @@ -806,7 +801,7 @@ private[sql] class SQLConf extends Serializable with CatalystConf with Logging { def variableSubstituteDepth: Int = getConf(VARIABLE_SUBSTITUTE_DEPTH) - def warehousePath: String = new Path(getConf(WAREHOUSE_PATH)).toString + def warehousePath: String = new Path(getConf(StaticSQLConf.WAREHOUSE_PATH)).toString def ignoreCorruptFiles: Boolean = getConf(IGNORE_CORRUPT_FILES) @@ -951,6 +946,11 @@ object StaticSQLConf { } } + val WAREHOUSE_PATH = buildConf("spark.sql.warehouse.dir") +.doc("The default location for managed databases and tables.") +.stringConf +.createWithDefault(Utils.resolveURI("spark-warehouse").toString) + val CATALOG_IMPLEMENTATION = buildConf("spark.sql.catalogImplementation") .internal() .stringConf http://git-wip-us.apache.org/repos/asf/spark/blob/4ac9759f/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
spark git commit: [SPARK-18377][SQL] warehouse path should be a static conf
Repository: spark Updated Branches: refs/heads/branch-2.1 175c47864 -> 436ae201f [SPARK-18377][SQL] warehouse path should be a static conf ## What changes were proposed in this pull request? it's weird that every session can set its own warehouse path at runtime, we should forbid it and make it a static conf. ## How was this patch tested? existing tests. Author: Wenchen FanCloses #15825 from cloud-fan/warehouse. (cherry picked from commit 4ac9759f807d217b6f67badc6d5f6b7138eb92d2) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/436ae201 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/436ae201 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/436ae201 Branch: refs/heads/branch-2.1 Commit: 436ae201f825c02b9720805ada8c0dca496a1ac5 Parents: 175c478 Author: Wenchen Fan Authored: Tue Nov 15 20:24:36 2016 -0800 Committer: Reynold Xin Committed: Tue Nov 15 20:24:45 2016 -0800 -- .../sql/catalyst/catalog/SessionCatalog.scala | 9 +- .../org/apache/spark/sql/internal/SQLConf.scala | 12 +- .../apache/spark/sql/internal/SharedState.scala | 32 +-- .../spark/sql/execution/command/DDLSuite.scala | 193 +++ .../spark/sql/internal/SQLConfSuite.scala | 16 +- .../org/apache/spark/sql/hive/HiveUtils.scala | 4 +- .../spark/sql/hive/execution/HiveDDLSuite.scala | 85 7 files changed, 142 insertions(+), 209 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/436ae201/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala index c8b61d8..19a8fcd 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala @@ -83,14 +83,7 @@ class SessionCatalog( // check whether the temporary table or function exists, then, if not, operate on // the corresponding item in the current database. @GuardedBy("this") - protected var currentDb = { -val defaultName = DEFAULT_DATABASE -val defaultDbDefinition = - CatalogDatabase(defaultName, "default database", conf.warehousePath, Map()) -// Initialize default database if it doesn't already exist -createDatabase(defaultDbDefinition, ignoreIfExists = true) -formatDatabaseName(defaultName) - } + protected var currentDb = formatDatabaseName(DEFAULT_DATABASE) /** * Format table name, taking into account case sensitivity. http://git-wip-us.apache.org/repos/asf/spark/blob/436ae201/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 7b8ed65..7cca9db 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -56,11 +56,6 @@ object SQLConf { } - val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir") -.doc("The default location for managed databases and tables.") -.stringConf -.createWithDefault(Utils.resolveURI("spark-warehouse").toString) - val OPTIMIZER_MAX_ITERATIONS = SQLConfigBuilder("spark.sql.optimizer.maxIterations") .internal() .doc("The max number of iterations the optimizer and analyzer runs.") @@ -773,7 +768,7 @@ private[sql] class SQLConf extends Serializable with CatalystConf with Logging { def variableSubstituteDepth: Int = getConf(VARIABLE_SUBSTITUTE_DEPTH) - def warehousePath: String = new Path(getConf(WAREHOUSE_PATH)).toString + def warehousePath: String = new Path(getConf(StaticSQLConf.WAREHOUSE_PATH)).toString def ignoreCorruptFiles: Boolean = getConf(IGNORE_CORRUPT_FILES) @@ -918,6 +913,11 @@ object StaticSQLConf { } } + val WAREHOUSE_PATH = buildConf("spark.sql.warehouse.dir") +.doc("The default location for managed databases and tables.") +.stringConf +.createWithDefault(Utils.resolveURI("spark-warehouse").toString) + val CATALOG_IMPLEMENTATION = buildConf("spark.sql.catalogImplementation") .internal() .stringConf http://git-wip-us.apache.org/repos/asf/spark/blob/436ae201/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
spark git commit: [SPARK-18300][SQL] Do not apply foldable propagation with expand as a child [BRANCH-2.0]
Repository: spark Updated Branches: refs/heads/branch-2.0 e2452c632 -> 8d55886aa [SPARK-18300][SQL] Do not apply foldable propagation with expand as a child [BRANCH-2.0] ## What changes were proposed in this pull request? The `FoldablePropagation` optimizer rule, pulls foldable values out from under an `Expand`. This breaks the `Expand` in two ways: - It rewrites the output attributes of the `Expand`. We explicitly define output attributes for `Expand`, these are (unfortunately) considered as part of the expressions of the `Expand` and can be rewritten. - Expand can actually change the column (it will typically re-use the attributes or the underlying plan). This means that we cannot safely propagate the expressions from under an `Expand`. This PR fixes this and (hopefully) other issues by explicitly whitelisting allowed operators. This is a backport of https://github.com/apache/spark/pull/15857 ## How was this patch tested? Added tests to `FoldablePropagationSuite` and to `SQLQueryTestSuite`. Author: Herman van HovellCloses #15892 from hvanhovell/SPARK-18300-branch-2.0. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8d55886a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8d55886a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8d55886a Branch: refs/heads/branch-2.0 Commit: 8d55886aaa781f3b9f09de1a2d6b422c95dcb4d2 Parents: e2452c6 Author: Herman van Hovell Authored: Tue Nov 15 18:21:26 2016 -0800 Committer: Reynold Xin Committed: Tue Nov 15 18:21:26 2016 -0800 -- .../sql/catalyst/optimizer/Optimizer.scala | 78 +--- .../optimizer/FoldablePropagationSuite.scala| 28 ++- .../resources/sql-tests/inputs/group-by.sql | 3 + .../sql-tests/results/group-by.sql.out | 10 ++- 4 files changed, 88 insertions(+), 31 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8d55886a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala index f0992b3..0a28ef4 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala @@ -646,46 +646,72 @@ object FoldablePropagation extends Rule[LogicalPlan] { } case _ => Nil }) +val replaceFoldable: PartialFunction[Expression, Expression] = { + case a: AttributeReference if foldableMap.contains(a) => foldableMap(a) +} if (foldableMap.isEmpty) { plan } else { var stop = false CleanupAliases(plan.transformUp { -case u: Union => - stop = true - u -case c: Command => - stop = true - c -// For outer join, although its output attributes are derived from its children, they are -// actually different attributes: the output of outer join is not always picked from its -// children, but can also be null. +// A leaf node should not stop the folding process (note that we are traversing up the +// tree, starting at the leaf nodes); so we are allowing it. +case l: LeafNode => + l + +// We can only propagate foldables for a subset of unary nodes. +case u: UnaryNode if !stop && canPropagateFoldables(u) => + u.transformExpressions(replaceFoldable) + +// Allow inner joins. We do not allow outer join, although its output attributes are +// derived from its children, they are actually different attributes: the output of outer +// join is not always picked from its children, but can also be null. // TODO(cloud-fan): It seems more reasonable to use new attributes as the output attributes // of outer join. -case j @ Join(_, _, LeftOuter | RightOuter | FullOuter, _) => +case j @ Join(_, _, Inner, _) => + j.transformExpressions(replaceFoldable) + +// We can fold the projections an expand holds. However expand changes the output columns +// and often reuses the underlying attributes; so we cannot assume that a column is still +// foldable after the expand has been applied. +// TODO(hvanhovell): Expand should use new attributes as the output attributes. +case expand: Expand if !stop => + val newExpand = expand.copy(projections = expand.projections.map { projection => +
spark git commit: [SPARK-18232][MESOS] Support CNI
Repository: spark Updated Branches: refs/heads/master 86430cc4e -> d89bfc923 [SPARK-18232][MESOS] Support CNI ## What changes were proposed in this pull request? Adds support for CNI-isolated containers ## How was this patch tested? I launched SparkPi both with and without `spark.mesos.network.name`, and verified the job completed successfully. Author: Michael GummeltCloses #15740 from mgummelt/spark-342-cni. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d89bfc92 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d89bfc92 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d89bfc92 Branch: refs/heads/master Commit: d89bfc92302424406847ac7a9cfca714e6b742fc Parents: 86430cc Author: Michael Gummelt Authored: Mon Nov 14 23:46:54 2016 -0800 Committer: Reynold Xin Committed: Mon Nov 14 23:46:54 2016 -0800 -- docs/running-on-mesos.md| 27 +++-- .../cluster/mesos/MesosClusterScheduler.scala | 8 +- .../MesosCoarseGrainedSchedulerBackend.scala| 23 ++-- .../MesosFineGrainedSchedulerBackend.scala | 9 +- .../mesos/MesosSchedulerBackendUtil.scala | 120 +-- .../mesos/MesosClusterSchedulerSuite.scala | 26 ...esosCoarseGrainedSchedulerBackendSuite.scala | 19 ++- 7 files changed, 131 insertions(+), 101 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d89bfc92/docs/running-on-mesos.md -- diff --git a/docs/running-on-mesos.md b/docs/running-on-mesos.md index 923d8db..8d5ad12 100644 --- a/docs/running-on-mesos.md +++ b/docs/running-on-mesos.md @@ -368,17 +368,6 @@ See the [configuration page](configuration.html) for information on Spark config - spark.mesos.executor.docker.portmaps - (none) - -Set the list of incoming ports exposed by the Docker image, which was set using -spark.mesos.executor.docker.image. The format of this property is a comma-separated list of -mappings which take the form: - -host_port:container_port[:tcp|:udp] - - - spark.mesos.executor.home driver side SPARK_HOME @@ -505,12 +494,26 @@ See the [configuration page](configuration.html) for information on Spark config Set the maximum number GPU resources to acquire for this job. Note that executors will still launch when no GPU resources are found since this configuration is just a upper limit and not a guaranteed amount. + + + spark.mesos.network.name + (none) + +Attach containers to the given named network. If this job is +launched in cluster mode, also launch the driver in the given named +network. See +http://mesos.apache.org/documentation/latest/cni/;>the Mesos CNI docs +for more details. + spark.mesos.fetcherCache.enable false -If set to `true`, all URIs (example: `spark.executor.uri`, `spark.mesos.uris`) will be cached by the [Mesos fetcher cache](http://mesos.apache.org/documentation/latest/fetcher/) +If set to `true`, all URIs (example: `spark.executor.uri`, +`spark.mesos.uris`) will be cached by the http://mesos.apache.org/documentation/latest/fetcher/;>Mesos +Fetcher Cache http://git-wip-us.apache.org/repos/asf/spark/blob/d89bfc92/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala -- diff --git a/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala b/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala index 8db1d12..f384290 100644 --- a/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala +++ b/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala @@ -531,13 +531,7 @@ private[spark] class MesosClusterScheduler( .setCommand(buildDriverCommand(desc)) .addAllResources(cpuResourcesToUse.asJava) .addAllResources(memResourcesToUse.asJava) - -desc.conf.getOption("spark.mesos.executor.docker.image").foreach { image => - MesosSchedulerBackendUtil.setupContainerBuilderDockerInfo(image, -desc.conf, -taskInfo.getContainerBuilder) -} - +taskInfo.setContainer(MesosSchedulerBackendUtil.containerInfo(desc.conf)) taskInfo.build } http://git-wip-us.apache.org/repos/asf/spark/blob/d89bfc92/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala -- diff --git
spark-website git commit: Add Big Data Analytics with Spark and Hadoop book.
Repository: spark-website Updated Branches: refs/heads/asf-site 8f5026783 -> 4e10a1ac1 Add Big Data Analytics with Spark and Hadoop book. Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/4e10a1ac Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/4e10a1ac Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/4e10a1ac Branch: refs/heads/asf-site Commit: 4e10a1ac10fa773f891422c7c1a3727e47feca8e Parents: 8f50267 Author: Reynold XinAuthored: Mon Nov 14 23:26:06 2016 -0800 Committer: Reynold Xin Committed: Mon Nov 14 23:26:06 2016 -0800 -- documentation.md| 1 + site/documentation.html | 1 + 2 files changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark-website/blob/4e10a1ac/documentation.md -- diff --git a/documentation.md b/documentation.md index 3927264..0ff8ed2 100644 --- a/documentation.md +++ b/documentation.md @@ -168,6 +168,7 @@ Slides, videos and EC2-based exercises from each of these are available online: https://www.packtpub.com/big-data-and-business-intelligence/mastering-apache-spark;>Mastering Apache Spark, by Mike Frampton (Packt Publishing) http://www.apress.com/9781484209653;>Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, by Mohammed Guller (Apress) https://www.packtpub.com/big-data-and-business-intelligence/large-scale-machine-learning-spark;>Large Scale Machine Learning with Spark, by Md. Rezaul Karim, Md. Mahedi Kaysar (Packt Publishing) + https://www.packtpub.com/big-data-and-business-intelligence/big-data-analytics;>Big Data Analytics with Spark and Hadoop, by Venkat Ankam (Packt Publishing) Examples http://git-wip-us.apache.org/repos/asf/spark-website/blob/4e10a1ac/site/documentation.html -- diff --git a/site/documentation.html b/site/documentation.html index 9414acd..60c1b59 100644 --- a/site/documentation.html +++ b/site/documentation.html @@ -342,6 +342,7 @@ Slides, videos and EC2-based exercises from each of these are available online: https://www.packtpub.com/big-data-and-business-intelligence/mastering-apache-spark;>Mastering Apache Spark, by Mike Frampton (Packt Publishing) http://www.apress.com/9781484209653;>Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, by Mohammed Guller (Apress) https://www.packtpub.com/big-data-and-business-intelligence/large-scale-machine-learning-spark;>Large Scale Machine Learning with Spark, by Md. Rezaul Karim, Md. Mahedi Kaysar (Packt Publishing) + https://www.packtpub.com/big-data-and-business-intelligence/big-data-analytics;>Big Data Analytics with Spark and Hadoop, by Venkat Ankam (Packt Publishing) Examples - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark-website git commit: Update Maven coordinates.
Repository: spark-website Updated Branches: refs/heads/asf-site 8940afe14 -> 8f5026783 Update Maven coordinates. Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/8f502678 Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/8f502678 Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/8f502678 Branch: refs/heads/asf-site Commit: 8f50267839c04dcf325210173b41839568b544ab Parents: 8940afe Author: Reynold XinAuthored: Mon Nov 14 23:14:45 2016 -0800 Committer: Reynold Xin Committed: Mon Nov 14 23:14:45 2016 -0800 -- downloads.md| 4 ++-- site/downloads.html | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark-website/blob/8f502678/downloads.md -- diff --git a/downloads.md b/downloads.md index 94462bb..0031a05 100644 --- a/downloads.md +++ b/downloads.md @@ -51,7 +51,7 @@ Spark artifacts are [hosted in Maven Central](http://search.maven.org/#search%7C groupId: org.apache.spark artifactId: spark-core_2.11 -version: 2.0.1 +version: 2.0.2 ### Spark Source Code Management If you are interested in working with the newest under-development code or contributing to Apache Spark development, you can also check out the master branch from Git: @@ -59,7 +59,7 @@ If you are interested in working with the newest under-development code or contr # Master development branch git clone git://github.com/apache/spark.git -# 2.0 maintenance branch with stability fixes on top of Spark 2.0.1 +# 2.0 maintenance branch with stability fixes on top of Spark 2.0.2 git clone git://github.com/apache/spark.git -b branch-2.0 Once you've downloaded Spark, you can find instructions for installing and building it on the documentation page. http://git-wip-us.apache.org/repos/asf/spark-website/blob/8f502678/site/downloads.html -- diff --git a/site/downloads.html b/site/downloads.html index d06b5ac..e96a141 100644 --- a/site/downloads.html +++ b/site/downloads.html @@ -235,7 +235,7 @@ You can select and download it above. groupId: org.apache.spark artifactId: spark-core_2.11 -version: 2.0.1 +version: 2.0.2 Spark Source Code Management @@ -244,7 +244,7 @@ version: 2.0.1 # Master development branch git clone git://github.com/apache/spark.git -# 2.0 maintenance branch with stability fixes on top of Spark 2.0.1 +# 2.0 maintenance branch with stability fixes on top of Spark 2.0.2 git clone git://github.com/apache/spark.git -b branch-2.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-18430][SQL] Fixed Exception Messages when Hitting an Invocation Exception of Function Lookup
Repository: spark Updated Branches: refs/heads/master c31def1dd -> 86430cc4e [SPARK-18430][SQL] Fixed Exception Messages when Hitting an Invocation Exception of Function Lookup ### What changes were proposed in this pull request? When the exception is an invocation exception during function lookup, we return a useless/confusing error message: For example, ```Scala df.selectExpr("concat_ws()") ``` Below is the error message we got: ``` null; line 1 pos 0 org.apache.spark.sql.AnalysisException: null; line 1 pos 0 ``` To get the meaningful error message, we need to get the cause. The fix is exactly the same as what we did in https://github.com/apache/spark/pull/12136. After the fix, the message we got is the exception issued in the constuctor of function implementation: ``` requirement failed: concat_ws requires at least one argument.; line 1 pos 0 org.apache.spark.sql.AnalysisException: requirement failed: concat_ws requires at least one argument.; line 1 pos 0 ``` ### How was this patch tested? Added test cases. Author: gatorsmileCloses #15878 from gatorsmile/functionNotFound. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/86430cc4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/86430cc4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/86430cc4 Branch: refs/heads/master Commit: 86430cc4e8dbc65a091a532fc9c5ec12b7be04f4 Parents: c31def1 Author: gatorsmile Authored: Mon Nov 14 21:21:34 2016 -0800 Committer: Reynold Xin Committed: Mon Nov 14 21:21:34 2016 -0800 -- .../catalyst/analysis/FunctionRegistry.scala| 5 - .../sql-tests/inputs/string-functions.sql | 3 +++ .../sql-tests/results/string-functions.sql.out | 20 3 files changed, 27 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/86430cc4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala index b028d07..007cdc1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala @@ -446,7 +446,10 @@ object FunctionRegistry { // If there is an apply method that accepts Seq[Expression], use that one. Try(varargCtor.get.newInstance(expressions).asInstanceOf[Expression]) match { case Success(e) => e - case Failure(e) => throw new AnalysisException(e.getMessage) + case Failure(e) => +// the exception is an invocation exception. To get a meaningful message, we need the +// cause. +throw new AnalysisException(e.getCause.getMessage) } } else { // Otherwise, find a constructor method that matches the number of arguments, and use that. http://git-wip-us.apache.org/repos/asf/spark/blob/86430cc4/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql -- diff --git a/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql b/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql new file mode 100644 index 000..f21981e --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql @@ -0,0 +1,3 @@ +-- Argument number exception +select concat_ws(); +select format_string(); http://git-wip-us.apache.org/repos/asf/spark/blob/86430cc4/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out -- diff --git a/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out b/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out new file mode 100644 index 000..6961e9b --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out @@ -0,0 +1,20 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 2 + + +-- !query 0 +select concat_ws() +-- !query 0 schema +struct<> +-- !query 0 output +org.apache.spark.sql.AnalysisException +requirement failed: concat_ws requires at least one argument.; line 1 pos 7 + + +-- !query 1 +select format_string() +-- !query 1 schema +struct<> +-- !query 1 output +org.apache.spark.sql.AnalysisException +requirement failed: format_string() should take at least 1 argument; line 1 pos 7
spark git commit: [SPARK-18430][SQL] Fixed Exception Messages when Hitting an Invocation Exception of Function Lookup
Repository: spark Updated Branches: refs/heads/branch-2.1 649c15fae -> a0125fd68 [SPARK-18430][SQL] Fixed Exception Messages when Hitting an Invocation Exception of Function Lookup ### What changes were proposed in this pull request? When the exception is an invocation exception during function lookup, we return a useless/confusing error message: For example, ```Scala df.selectExpr("concat_ws()") ``` Below is the error message we got: ``` null; line 1 pos 0 org.apache.spark.sql.AnalysisException: null; line 1 pos 0 ``` To get the meaningful error message, we need to get the cause. The fix is exactly the same as what we did in https://github.com/apache/spark/pull/12136. After the fix, the message we got is the exception issued in the constuctor of function implementation: ``` requirement failed: concat_ws requires at least one argument.; line 1 pos 0 org.apache.spark.sql.AnalysisException: requirement failed: concat_ws requires at least one argument.; line 1 pos 0 ``` ### How was this patch tested? Added test cases. Author: gatorsmileCloses #15878 from gatorsmile/functionNotFound. (cherry picked from commit 86430cc4e8dbc65a091a532fc9c5ec12b7be04f4) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a0125fd6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a0125fd6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a0125fd6 Branch: refs/heads/branch-2.1 Commit: a0125fd6847d5dbce92dc92cb5b16ee00f0ff6a8 Parents: 649c15f Author: gatorsmile Authored: Mon Nov 14 21:21:34 2016 -0800 Committer: Reynold Xin Committed: Mon Nov 14 21:21:45 2016 -0800 -- .../catalyst/analysis/FunctionRegistry.scala| 5 - .../sql-tests/inputs/string-functions.sql | 3 +++ .../sql-tests/results/string-functions.sql.out | 20 3 files changed, 27 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a0125fd6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala index b028d07..007cdc1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala @@ -446,7 +446,10 @@ object FunctionRegistry { // If there is an apply method that accepts Seq[Expression], use that one. Try(varargCtor.get.newInstance(expressions).asInstanceOf[Expression]) match { case Success(e) => e - case Failure(e) => throw new AnalysisException(e.getMessage) + case Failure(e) => +// the exception is an invocation exception. To get a meaningful message, we need the +// cause. +throw new AnalysisException(e.getCause.getMessage) } } else { // Otherwise, find a constructor method that matches the number of arguments, and use that. http://git-wip-us.apache.org/repos/asf/spark/blob/a0125fd6/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql -- diff --git a/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql b/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql new file mode 100644 index 000..f21981e --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql @@ -0,0 +1,3 @@ +-- Argument number exception +select concat_ws(); +select format_string(); http://git-wip-us.apache.org/repos/asf/spark/blob/a0125fd6/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out -- diff --git a/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out b/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out new file mode 100644 index 000..6961e9b --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out @@ -0,0 +1,20 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 2 + + +-- !query 0 +select concat_ws() +-- !query 0 schema +struct<> +-- !query 0 output +org.apache.spark.sql.AnalysisException +requirement failed: concat_ws requires at least one argument.; line 1 pos 7 + + +-- !query 1 +select format_string() +-- !query 1 schema +struct<> +-- !query 1 output +org.apache.spark.sql.AnalysisException +requirement failed:
spark git commit: [SPARK-18428][DOC] Update docs for GraphX
Repository: spark Updated Branches: refs/heads/branch-2.1 27999b366 -> 649c15fae [SPARK-18428][DOC] Update docs for GraphX ## What changes were proposed in this pull request? 1, Add link of `VertexRDD` and `EdgeRDD` 2, Notify in `Vertex and Edge RDDs` that not all methods are listed 3, `VertexID` -> `VertexId` ## How was this patch tested? No tests, only docs is modified Author: Zheng RuiFengCloses #15875 from zhengruifeng/update_graphop_doc. (cherry picked from commit c31def1ddcbed340bfc071d54fb3dc7945cb525a) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/649c15fa Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/649c15fa Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/649c15fa Branch: refs/heads/branch-2.1 Commit: 649c15fae423a415cb6165aa0ef6d97ab4949afb Parents: 27999b3 Author: Zheng RuiFeng Authored: Mon Nov 14 21:15:39 2016 -0800 Committer: Reynold Xin Committed: Mon Nov 14 21:18:35 2016 -0800 -- docs/graphx-programming-guide.md | 68 ++- 1 file changed, 35 insertions(+), 33 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/649c15fa/docs/graphx-programming-guide.md -- diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md index 58671e6..1097cf1 100644 --- a/docs/graphx-programming-guide.md +++ b/docs/graphx-programming-guide.md @@ -11,6 +11,7 @@ description: GraphX graph processing library guide for Spark SPARK_VERSION_SHORT [EdgeRDD]: api/scala/index.html#org.apache.spark.graphx.EdgeRDD +[VertexRDD]: api/scala/index.html#org.apache.spark.graphx.VertexRDD [Edge]: api/scala/index.html#org.apache.spark.graphx.Edge [EdgeTriplet]: api/scala/index.html#org.apache.spark.graphx.EdgeTriplet [Graph]: api/scala/index.html#org.apache.spark.graphx.Graph @@ -89,7 +90,7 @@ with user defined objects attached to each vertex and edge. A directed multigra graph with potentially multiple parallel edges sharing the same source and destination vertex. The ability to support parallel edges simplifies modeling scenarios where there can be multiple relationships (e.g., co-worker and friend) between the same vertices. Each vertex is keyed by a -*unique* 64-bit long identifier (`VertexID`). GraphX does not impose any ordering constraints on +*unique* 64-bit long identifier (`VertexId`). GraphX does not impose any ordering constraints on the vertex identifiers. Similarly, edges have corresponding source and destination vertex identifiers. @@ -130,12 +131,12 @@ class Graph[VD, ED] { } {% endhighlight %} -The classes `VertexRDD[VD]` and `EdgeRDD[ED]` extend and are optimized versions of `RDD[(VertexID, +The classes `VertexRDD[VD]` and `EdgeRDD[ED]` extend and are optimized versions of `RDD[(VertexId, VD)]` and `RDD[Edge[ED]]` respectively. Both `VertexRDD[VD]` and `EdgeRDD[ED]` provide additional functionality built around graph computation and leverage internal optimizations. We discuss the -`VertexRDD` and `EdgeRDD` API in greater detail in the section on [vertex and edge +`VertexRDD`[VertexRDD] and `EdgeRDD`[EdgeRDD] API in greater detail in the section on [vertex and edge RDDs](#vertex_and_edge_rdds) but for now they can be thought of as simply RDDs of the form: -`RDD[(VertexID, VD)]` and `RDD[Edge[ED]]`. +`RDD[(VertexId, VD)]` and `RDD[Edge[ED]]`. ### Example Property Graph @@ -197,7 +198,7 @@ graph.edges.filter(e => e.srcId > e.dstId).count {% endhighlight %} > Note that `graph.vertices` returns an `VertexRDD[(String, String)]` which > extends -> `RDD[(VertexID, (String, String))]` and so we use the scala `case` expression to deconstruct the +> `RDD[(VertexId, (String, String))]` and so we use the scala `case` expression to deconstruct the > tuple. On the other hand, `graph.edges` returns an `EdgeRDD` containing > `Edge[String]` objects. > We could have also used the case class type constructor as in the following: > {% highlight scala %} @@ -287,7 +288,7 @@ class Graph[VD, ED] { // Change the partitioning heuristic def partitionBy(partitionStrategy: PartitionStrategy): Graph[VD, ED] // Transform vertex and edge attributes == - def mapVertices[VD2](map: (VertexID, VD) => VD2): Graph[VD2, ED] + def mapVertices[VD2](map: (VertexId, VD) => VD2): Graph[VD2, ED] def mapEdges[ED2](map: Edge[ED] => ED2): Graph[VD, ED2] def mapEdges[ED2](map: (PartitionID, Iterator[Edge[ED]]) => Iterator[ED2]): Graph[VD, ED2] def mapTriplets[ED2](map:
spark git commit: [SPARK-18428][DOC] Update docs for GraphX
Repository: spark Updated Branches: refs/heads/master c07187823 -> c31def1dd [SPARK-18428][DOC] Update docs for GraphX ## What changes were proposed in this pull request? 1, Add link of `VertexRDD` and `EdgeRDD` 2, Notify in `Vertex and Edge RDDs` that not all methods are listed 3, `VertexID` -> `VertexId` ## How was this patch tested? No tests, only docs is modified Author: Zheng RuiFengCloses #15875 from zhengruifeng/update_graphop_doc. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c31def1d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c31def1d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c31def1d Branch: refs/heads/master Commit: c31def1ddcbed340bfc071d54fb3dc7945cb525a Parents: c071878 Author: Zheng RuiFeng Authored: Mon Nov 14 21:15:39 2016 -0800 Committer: Reynold Xin Committed: Mon Nov 14 21:15:39 2016 -0800 -- docs/graphx-programming-guide.md | 68 ++- 1 file changed, 35 insertions(+), 33 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c31def1d/docs/graphx-programming-guide.md -- diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md index 58671e6..1097cf1 100644 --- a/docs/graphx-programming-guide.md +++ b/docs/graphx-programming-guide.md @@ -11,6 +11,7 @@ description: GraphX graph processing library guide for Spark SPARK_VERSION_SHORT [EdgeRDD]: api/scala/index.html#org.apache.spark.graphx.EdgeRDD +[VertexRDD]: api/scala/index.html#org.apache.spark.graphx.VertexRDD [Edge]: api/scala/index.html#org.apache.spark.graphx.Edge [EdgeTriplet]: api/scala/index.html#org.apache.spark.graphx.EdgeTriplet [Graph]: api/scala/index.html#org.apache.spark.graphx.Graph @@ -89,7 +90,7 @@ with user defined objects attached to each vertex and edge. A directed multigra graph with potentially multiple parallel edges sharing the same source and destination vertex. The ability to support parallel edges simplifies modeling scenarios where there can be multiple relationships (e.g., co-worker and friend) between the same vertices. Each vertex is keyed by a -*unique* 64-bit long identifier (`VertexID`). GraphX does not impose any ordering constraints on +*unique* 64-bit long identifier (`VertexId`). GraphX does not impose any ordering constraints on the vertex identifiers. Similarly, edges have corresponding source and destination vertex identifiers. @@ -130,12 +131,12 @@ class Graph[VD, ED] { } {% endhighlight %} -The classes `VertexRDD[VD]` and `EdgeRDD[ED]` extend and are optimized versions of `RDD[(VertexID, +The classes `VertexRDD[VD]` and `EdgeRDD[ED]` extend and are optimized versions of `RDD[(VertexId, VD)]` and `RDD[Edge[ED]]` respectively. Both `VertexRDD[VD]` and `EdgeRDD[ED]` provide additional functionality built around graph computation and leverage internal optimizations. We discuss the -`VertexRDD` and `EdgeRDD` API in greater detail in the section on [vertex and edge +`VertexRDD`[VertexRDD] and `EdgeRDD`[EdgeRDD] API in greater detail in the section on [vertex and edge RDDs](#vertex_and_edge_rdds) but for now they can be thought of as simply RDDs of the form: -`RDD[(VertexID, VD)]` and `RDD[Edge[ED]]`. +`RDD[(VertexId, VD)]` and `RDD[Edge[ED]]`. ### Example Property Graph @@ -197,7 +198,7 @@ graph.edges.filter(e => e.srcId > e.dstId).count {% endhighlight %} > Note that `graph.vertices` returns an `VertexRDD[(String, String)]` which > extends -> `RDD[(VertexID, (String, String))]` and so we use the scala `case` expression to deconstruct the +> `RDD[(VertexId, (String, String))]` and so we use the scala `case` expression to deconstruct the > tuple. On the other hand, `graph.edges` returns an `EdgeRDD` containing > `Edge[String]` objects. > We could have also used the case class type constructor as in the following: > {% highlight scala %} @@ -287,7 +288,7 @@ class Graph[VD, ED] { // Change the partitioning heuristic def partitionBy(partitionStrategy: PartitionStrategy): Graph[VD, ED] // Transform vertex and edge attributes == - def mapVertices[VD2](map: (VertexID, VD) => VD2): Graph[VD2, ED] + def mapVertices[VD2](map: (VertexId, VD) => VD2): Graph[VD2, ED] def mapEdges[ED2](map: Edge[ED] => ED2): Graph[VD, ED2] def mapEdges[ED2](map: (PartitionID, Iterator[Edge[ED]]) => Iterator[ED2]): Graph[VD, ED2] def mapTriplets[ED2](map: EdgeTriplet[VD, ED] => ED2): Graph[VD, ED2] @@ -297,18 +298,18 @@ class Graph[VD, ED] { def reverse: Graph[VD, ED] def
[3/3] spark-website git commit: Add 2.0.2 release.
Add 2.0.2 release. Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/39b5c3d6 Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/39b5c3d6 Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/39b5c3d6 Branch: refs/heads/asf-site Commit: 39b5c3d65a6a02b97b84cd1e90a08d119ba600e3 Parents: 0bd3631 Author: Reynold XinAuthored: Mon Nov 14 17:47:52 2016 -0800 Committer: Reynold Xin Committed: Mon Nov 14 17:47:52 2016 -0800 -- _layouts/global.html| 2 +- js/downloads.js | 1 + news/_posts/2016-11-14-spark-2-0-2-released.md | 16 ++ .../_posts/2016-11-14-spark-release-2-0-2.md| 20 ++ site/community.html | 8 +- site/docs/2.0.2/latest | 1 + site/docs/latest| 2 +- site/documentation.html | 8 +- site/downloads.html | 8 +- site/examples.html | 8 +- site/faq.html | 8 +- site/graphx/index.html | 8 +- site/index.html | 8 +- site/js/downloads.js| 1 + site/mailing-lists.html | 8 +- site/mllib/index.html | 8 +- site/news/amp-camp-2013-registration-ope.html | 8 +- .../news/announcing-the-first-spark-summit.html | 8 +- .../news/fourth-spark-screencast-published.html | 8 +- site/news/index.html| 18 +- site/news/nsdi-paper.html | 8 +- site/news/one-month-to-spark-summit-2015.html | 8 +- .../proposals-open-for-spark-summit-east.html | 8 +- ...registration-open-for-spark-summit-east.html | 8 +- .../news/run-spark-and-shark-on-amazon-emr.html | 8 +- site/news/spark-0-6-1-and-0-5-2-released.html | 8 +- site/news/spark-0-6-2-released.html | 8 +- site/news/spark-0-7-0-released.html | 8 +- site/news/spark-0-7-2-released.html | 8 +- site/news/spark-0-7-3-released.html | 8 +- site/news/spark-0-8-0-released.html | 8 +- site/news/spark-0-8-1-released.html | 8 +- site/news/spark-0-9-0-released.html | 8 +- site/news/spark-0-9-1-released.html | 8 +- site/news/spark-0-9-2-released.html | 8 +- site/news/spark-1-0-0-released.html | 8 +- site/news/spark-1-0-1-released.html | 8 +- site/news/spark-1-0-2-released.html | 8 +- site/news/spark-1-1-0-released.html | 8 +- site/news/spark-1-1-1-released.html | 8 +- site/news/spark-1-2-0-released.html | 8 +- site/news/spark-1-2-1-released.html | 8 +- site/news/spark-1-2-2-released.html | 8 +- site/news/spark-1-3-0-released.html | 8 +- site/news/spark-1-4-0-released.html | 8 +- site/news/spark-1-4-1-released.html | 8 +- site/news/spark-1-5-0-released.html | 8 +- site/news/spark-1-5-1-released.html | 8 +- site/news/spark-1-5-2-released.html | 8 +- site/news/spark-1-6-0-released.html | 8 +- site/news/spark-1-6-1-released.html | 8 +- site/news/spark-1-6-2-released.html | 8 +- site/news/spark-1-6-3-released.html | 8 +- site/news/spark-2-0-0-released.html | 8 +- site/news/spark-2-0-1-released.html | 8 +- site/news/spark-2-0-2-released.html | 213 ++ site/news/spark-2.0.0-preview.html | 8 +- .../spark-accepted-into-apache-incubator.html | 8 +- site/news/spark-and-shark-in-the-news.html | 8 +- site/news/spark-becomes-tlp.html| 8 +- site/news/spark-featured-in-wired.html | 8 +- .../spark-mailing-lists-moving-to-apache.html | 8 +- site/news/spark-meetups.html| 8 +- site/news/spark-screencasts-published.html | 8 +- site/news/spark-summit-2013-is-a-wrap.html | 8 +- site/news/spark-summit-2014-videos-posted.html | 8 +- site/news/spark-summit-2015-videos-posted.html | 8 +- site/news/spark-summit-agenda-posted.html | 8 +- .../spark-summit-east-2015-videos-posted.html | 8 +- .../spark-summit-east-2016-cfp-closing.html | 8 +- site/news/spark-summit-east-agenda-posted.html | 8 +- .../news/spark-summit-europe-agenda-posted.html | 8 +- site/news/spark-summit-europe.html | 8 +- .../spark-summit-june-2016-agenda-posted.html | 8 +- site/news/spark-tips-from-quantifind.html
[1/3] spark-website git commit: Add 2.0.2 release.
Repository: spark-website Updated Branches: refs/heads/asf-site 0bd363165 -> 39b5c3d65 http://git-wip-us.apache.org/repos/asf/spark-website/blob/39b5c3d6/site/releases/spark-release-1-1-1.html -- diff --git a/site/releases/spark-release-1-1-1.html b/site/releases/spark-release-1-1-1.html index fcf0c91..434c313 100644 --- a/site/releases/spark-release-1-1-1.html +++ b/site/releases/spark-release-1-1-1.html @@ -106,7 +106,7 @@ Documentation - Latest Release (Spark 2.0.1) + Latest Release (Spark 2.0.2) Older Versions and Other Resources @@ -150,6 +150,9 @@ Latest News + Spark 2.0.2 released + (Nov 14, 2016) + Spark 1.6.3 released (Nov 07, 2016) @@ -159,9 +162,6 @@ Spark 2.0.0 released (Jul 26, 2016) - Spark 1.6.2 released - (Jun 25, 2016) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/39b5c3d6/site/releases/spark-release-1-2-0.html -- diff --git a/site/releases/spark-release-1-2-0.html b/site/releases/spark-release-1-2-0.html index 0490be7..09e4007 100644 --- a/site/releases/spark-release-1-2-0.html +++ b/site/releases/spark-release-1-2-0.html @@ -106,7 +106,7 @@ Documentation - Latest Release (Spark 2.0.1) + Latest Release (Spark 2.0.2) Older Versions and Other Resources @@ -150,6 +150,9 @@ Latest News + Spark 2.0.2 released + (Nov 14, 2016) + Spark 1.6.3 released (Nov 07, 2016) @@ -159,9 +162,6 @@ Spark 2.0.0 released (Jul 26, 2016) - Spark 1.6.2 released - (Jun 25, 2016) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/39b5c3d6/site/releases/spark-release-1-2-1.html -- diff --git a/site/releases/spark-release-1-2-1.html b/site/releases/spark-release-1-2-1.html index c9efc6a..93acc9d 100644 --- a/site/releases/spark-release-1-2-1.html +++ b/site/releases/spark-release-1-2-1.html @@ -106,7 +106,7 @@ Documentation - Latest Release (Spark 2.0.1) + Latest Release (Spark 2.0.2) Older Versions and Other Resources @@ -150,6 +150,9 @@ Latest News + Spark 2.0.2 released + (Nov 14, 2016) + Spark 1.6.3 released (Nov 07, 2016) @@ -159,9 +162,6 @@ Spark 2.0.0 released (Jul 26, 2016) - Spark 1.6.2 released - (Jun 25, 2016) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/39b5c3d6/site/releases/spark-release-1-2-2.html -- diff --git a/site/releases/spark-release-1-2-2.html b/site/releases/spark-release-1-2-2.html index d76c619..32d4627 100644 --- a/site/releases/spark-release-1-2-2.html +++ b/site/releases/spark-release-1-2-2.html @@ -106,7 +106,7 @@ Documentation - Latest Release (Spark 2.0.1) + Latest Release (Spark 2.0.2) Older Versions and Other Resources @@ -150,6 +150,9 @@ Latest News + Spark 2.0.2 released + (Nov 14, 2016) + Spark 1.6.3 released (Nov 07, 2016) @@ -159,9 +162,6 @@ Spark 2.0.0 released (Jul 26, 2016) - Spark 1.6.2 released - (Jun 25, 2016) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/39b5c3d6/site/releases/spark-release-1-3-0.html -- diff --git a/site/releases/spark-release-1-3-0.html b/site/releases/spark-release-1-3-0.html index 435ed19..45180a7 100644 --- a/site/releases/spark-release-1-3-0.html +++ b/site/releases/spark-release-1-3-0.html @@ -106,7 +106,7 @@ Documentation - Latest Release (Spark 2.0.1) + Latest Release (Spark 2.0.2) Older Versions and Other Resources @@ -150,6 +150,9 @@ Latest News + Spark 2.0.2 released + (Nov 14, 2016) + Spark 1.6.3 released (Nov 07, 2016) @@ -159,9 +162,6 @@ Spark 2.0.0 released (Jul 26, 2016) - Spark 1.6.2 released - (Jun 25, 2016) -
[2/3] spark-website git commit: Add 2.0.2 release.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/39b5c3d6/site/news/spark-2-0-1-released.html -- diff --git a/site/news/spark-2-0-1-released.html b/site/news/spark-2-0-1-released.html index f772398..09e052d 100644 --- a/site/news/spark-2-0-1-released.html +++ b/site/news/spark-2-0-1-released.html @@ -106,7 +106,7 @@ Documentation - Latest Release (Spark 2.0.1) + Latest Release (Spark 2.0.2) Older Versions and Other Resources @@ -150,6 +150,9 @@ Latest News + Spark 2.0.2 released + (Nov 14, 2016) + Spark 1.6.3 released (Nov 07, 2016) @@ -159,9 +162,6 @@ Spark 2.0.0 released (Jul 26, 2016) - Spark 1.6.2 released - (Jun 25, 2016) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/39b5c3d6/site/news/spark-2-0-2-released.html -- diff --git a/site/news/spark-2-0-2-released.html b/site/news/spark-2-0-2-released.html new file mode 100644 index 000..0b1ffe7 --- /dev/null +++ b/site/news/spark-2-0-2-released.html @@ -0,0 +1,213 @@ + + + + + + + + + Spark 2.0.2 released | Apache Spark + + + + + + + + + + + + + + + + + var _gaq = _gaq || []; + _gaq.push(['_setAccount', 'UA-32518208-2']); + _gaq.push(['_trackPageview']); + (function() { +var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; +ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; +var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); + })(); + + + function trackOutboundLink(link, category, action) { +try { + _gaq.push(['_trackEvent', category , action]); +} catch(err){} + +setTimeout(function() { + document.location.href = link.href; +}, 100); + } + + + + + + + + +https://code.jquery.com/jquery.js"> + + + + + + + + + + + + Lightning-fast cluster computing + + + + + + + + + + Toggle navigation + + + + + + + + + + Download + + + Libraries + + + SQL and DataFrames + Spark Streaming + MLlib (machine learning) + GraphX (graph) + + https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects;>Third-Party Packages + + + + + Documentation + + + Latest Release (Spark 2.0.2) + Older Versions and Other Resources + + + Examples + + + Community + + + Mailing Lists + Events and Meetups + Project History + https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark;>Powered By + https://cwiki.apache.org/confluence/display/SPARK/Committers;>Project Committers + https://issues.apache.org/jira/browse/SPARK;>Issue Tracker + + + FAQ + + + +http://www.apache.org/; class="dropdown-toggle" data-toggle="dropdown"> + Apache Software Foundation + + http://www.apache.org/;>Apache Homepage + http://www.apache.org/licenses/;>License + http://www.apache.org/foundation/sponsorship.html;>Sponsorship + http://www.apache.org/foundation/thanks.html;>Thanks + http://www.apache.org/security/;>Security + + + + + + + + + + + + Latest News + + + Spark 2.0.2 released + (Nov 14, 2016) + + Spark 1.6.3 released + (Nov 07, 2016) + + Spark 2.0.1 released + (Oct 03, 2016) + + Spark 2.0.0 released + (Jul 26, 2016) + + + Archive + + + +Download Spark + + +Built-in Libraries: + + +SQL and DataFrames +Spark Streaming +MLlib (machine learning) +GraphX (graph) + + https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects;>Third-Party Packages + + + + +Spark 2.0.2 released + + +We are happy to announce the availability of Apache Spark 2.0.2! This maintenance release includes fixes across several areas of Spark, as well as Kafka 0.10 and runtime metrics support for Structured Streaming. + +Visit the release notes to read
spark git commit: [SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Structured Streaming Programming Guide
Repository: spark Updated Branches: refs/heads/branch-2.0 80c1a1f30 -> a719c5128 [SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Structured Streaming Programming Guide Update the python section of the Structured Streaming Guide from .builder() to .builder Validated documentation and successfully running the test example. Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. 'Builder' object is not callable object hence changed .builder() to .builder Author: Denny LeeCloses #15872 from dennyglee/master. (cherry picked from commit b91a51bb231af321860415075a7f404bc46e0a74) Signed-off-by: Reynold Xin (cherry picked from commit b6e4d3925239836334867d6ebcf22e5a1369cfc0) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a719c512 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a719c512 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a719c512 Branch: refs/heads/branch-2.0 Commit: a719c5128fddb133c5d496b85032e0049506a95c Parents: 80c1a1f Author: Denny Lee Authored: Sun Nov 13 18:10:06 2016 -0800 Committer: Reynold Xin Committed: Sun Nov 13 18:11:59 2016 -0800 -- docs/structured-streaming-programming-guide.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a719c512/docs/structured-streaming-programming-guide.md -- diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index be730b8..537aa06 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -59,9 +59,9 @@ from pyspark.sql import SparkSession from pyspark.sql.functions import explode from pyspark.sql.functions import split -spark = SparkSession\ -.builder()\ -.appName("StructuredNetworkWordCount")\ +spark = SparkSession \ +.builder \ +.appName("StructuredNetworkWordCount") \ .getOrCreate() {% endhighlight %} - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Structured Streaming Programming Guide
Repository: spark Updated Branches: refs/heads/branch-2.1 6fae4241f -> 0c69224ed [SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Structured Streaming Programming Guide ## What changes were proposed in this pull request? Update the python section of the Structured Streaming Guide from .builder() to .builder ## How was this patch tested? Validated documentation and successfully running the test example. Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. 'Builder' object is not callable object hence changed .builder() to .builder Author: Denny LeeCloses #15872 from dennyglee/master. (cherry picked from commit b91a51bb231af321860415075a7f404bc46e0a74) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0c69224e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0c69224e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0c69224e Branch: refs/heads/branch-2.1 Commit: 0c69224ed752c25be1545cfe8ba0db8487a70bf2 Parents: 6fae424 Author: Denny Lee Authored: Sun Nov 13 18:10:06 2016 -0800 Committer: Reynold Xin Committed: Sun Nov 13 18:10:16 2016 -0800 -- docs/structured-streaming-programming-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0c69224e/docs/structured-streaming-programming-guide.md -- diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index d838ed3..d254558 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -58,7 +58,7 @@ from pyspark.sql.functions import explode from pyspark.sql.functions import split spark = SparkSession \ -.builder() \ +.builder \ .appName("StructuredNetworkWordCount") \ .getOrCreate() {% endhighlight %} - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Structured Streaming Programming Guide
Repository: spark Updated Branches: refs/heads/master 1386fd28d -> b91a51bb2 [SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Structured Streaming Programming Guide ## What changes were proposed in this pull request? Update the python section of the Structured Streaming Guide from .builder() to .builder ## How was this patch tested? Validated documentation and successfully running the test example. Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. 'Builder' object is not callable object hence changed .builder() to .builder Author: Denny LeeCloses #15872 from dennyglee/master. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b91a51bb Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b91a51bb Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b91a51bb Branch: refs/heads/master Commit: b91a51bb231af321860415075a7f404bc46e0a74 Parents: 1386fd2 Author: Denny Lee Authored: Sun Nov 13 18:10:06 2016 -0800 Committer: Reynold Xin Committed: Sun Nov 13 18:10:06 2016 -0800 -- docs/structured-streaming-programming-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b91a51bb/docs/structured-streaming-programming-guide.md -- diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index d838ed3..d254558 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -58,7 +58,7 @@ from pyspark.sql.functions import explode from pyspark.sql.functions import split spark = SparkSession \ -.builder() \ +.builder \ .appName("StructuredNetworkWordCount") \ .getOrCreate() {% endhighlight %} - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r16970 - /dev/spark/spark-2.0.2/ /release/spark/spark-2.0.2/
Author: rxin Date: Fri Nov 11 22:51:28 2016 New Revision: 16970 Log: Artifacts for Spark 2.0.2 Added: release/spark/spark-2.0.2/ - copied from r16969, dev/spark/spark-2.0.2/ Removed: dev/spark/spark-2.0.2/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[26/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/tableToDF.html -- diff --git a/site/docs/2.0.2/api/R/tableToDF.html b/site/docs/2.0.2/api/R/tableToDF.html new file mode 100644 index 000..6a11caa --- /dev/null +++ b/site/docs/2.0.2/api/R/tableToDF.html @@ -0,0 +1,65 @@ + +R: Create a SparkDataFrame from a SparkSQL Table + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +tableToDF {SparkR}R Documentation + +Create a SparkDataFrame from a SparkSQL Table + +Description + +Returns the specified Table as a SparkDataFrame. The Table must have already been registered +in the SparkSession. + + + +Usage + + +tableToDF(tableName) + + + +Arguments + + +tableName + +The SparkSQL Table to convert to a SparkDataFrame. + + + + +Value + +SparkDataFrame + + + +Note + +tableToDF since 2.0.0 + + + +Examples + +## Not run: +##D sparkR.session() +##D path - path/to/file.json +##D df - read.json(path) +##D createOrReplaceTempView(df, table) +##D new_df - tableToDF(table) +## End(Not run) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/tables.html -- diff --git a/site/docs/2.0.2/api/R/tables.html b/site/docs/2.0.2/api/R/tables.html new file mode 100644 index 000..e486018 --- /dev/null +++ b/site/docs/2.0.2/api/R/tables.html @@ -0,0 +1,62 @@ + +R: Tables + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +tables {SparkR}R Documentation + +Tables + +Description + +Returns a SparkDataFrame containing names of tables in the given database. + + + +Usage + + +## Default S3 method: +tables(databaseName = NULL) + + + +Arguments + + +databaseName + +name of the database + + + + +Value + +a SparkDataFrame + + + +Note + +tables since 1.4.0 + + + +Examples + +## Not run: +##D sparkR.session() +##D tables(hive) +## End(Not run) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/take.html -- diff --git a/site/docs/2.0.2/api/R/take.html b/site/docs/2.0.2/api/R/take.html new file mode 100644 index 000..b792543 --- /dev/null +++ b/site/docs/2.0.2/api/R/take.html @@ -0,0 +1,262 @@ + +R: Take the first NUM rows of a SparkDataFrame and return the... + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +take {SparkR}R Documentation + +Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame + +Description + +Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame + + + +Usage + + +## S4 method for signature 'SparkDataFrame,numeric' +take(x, num) + +take(x, num) + + + +Arguments + + +x + +a SparkDataFrame. + +num + +number of rows to take. + + + + +Note + +take since 1.4.0 + + + +See Also + +Other SparkDataFrame functions: $, +$,SparkDataFrame-method, $-, +$-,SparkDataFrame-method, +select, select, +select,SparkDataFrame,Column-method, +select,SparkDataFrame,character-method, +select,SparkDataFrame,list-method; +SparkDataFrame-class; [, +[,SparkDataFrame-method, [[, +[[,SparkDataFrame,numericOrcharacter-method, +subset, subset, +subset,SparkDataFrame-method; +agg, agg, agg, +agg,GroupedData-method, +agg,SparkDataFrame-method, +summarize, summarize, +summarize, +summarize,GroupedData-method, +summarize,SparkDataFrame-method; +arrange, arrange, +arrange, +arrange,SparkDataFrame,Column-method, +arrange,SparkDataFrame,character-method, +orderBy,SparkDataFrame,characterOrColumn-method; +as.data.frame, +as.data.frame,SparkDataFrame-method; +attach, +attach,SparkDataFrame-method; +cache, cache, +cache,SparkDataFrame-method; +collect, collect, +collect,SparkDataFrame-method; +colnames, colnames, +colnames,SparkDataFrame-method, +colnames-, colnames-, +colnames-,SparkDataFrame-method, +columns, columns, +columns,SparkDataFrame-method, +names, +names,SparkDataFrame-method, +names-, +names-,SparkDataFrame-method; +coltypes, coltypes, +coltypes,SparkDataFrame-method, +coltypes-, coltypes-, +coltypes-,SparkDataFrame,character-method;
[21/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/allclasses-frame.html -- diff --git a/site/docs/2.0.2/api/java/allclasses-frame.html b/site/docs/2.0.2/api/java/allclasses-frame.html new file mode 100644 index 000..d3f2f4d --- /dev/null +++ b/site/docs/2.0.2/api/java/allclasses-frame.html @@ -0,0 +1,1119 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +All Classes (Spark 2.0.2 JavaDoc) + + + + +All Classes + + +AbsoluteError +Accumulable +AccumulableInfo +AccumulableInfo +AccumulableParam +Accumulator +AccumulatorContext +AccumulatorParam +AccumulatorParam.DoubleAccumulatorParam$ +AccumulatorParam.FloatAccumulatorParam$ +AccumulatorParam.IntAccumulatorParam$ +AccumulatorParam.LongAccumulatorParam$ +AccumulatorParam.StringAccumulatorParam$ +AccumulatorV2 +AFTAggregator +AFTCostFun +AFTSurvivalRegression +AFTSurvivalRegressionModel +AggregatedDialect +AggregatingEdgeContext +Aggregator +Aggregator +Algo +AllJobsCancelled +AllReceiverIds +ALS +ALS +ALS.InBlock$ +ALS.Rating +ALS.Rating$ +ALS.RatingBlock$ +ALSModel +AnalysisException +And +AnyDataType +ApplicationAttemptInfo +ApplicationInfo +ApplicationsListResource +ApplicationStatus +ApplyInPlace +AreaUnderCurve +ArrayType +AskPermissionToCommitOutput +AssociationRules +AssociationRules.Rule +AsyncRDDActions +Attribute +AttributeGroup +AttributeKeys +AttributeType +BaseRelation +BaseRRDD +BatchInfo +BernoulliCellSampler +BernoulliSampler +Binarizer +BinaryAttribute +BinaryClassificationEvaluator +BinaryClassificationMetrics +BinaryLogisticRegressionSummary +BinaryLogisticRegressionTrainingSummary +BinarySample +BinaryType +BinomialBounds +BisectingKMeans +BisectingKMeans +BisectingKMeansModel +BisectingKMeansModel +BisectingKMeansModel.SaveLoadV1_0$ +BLAS +BLAS +BlockId +BlockManagerId +BlockManagerMessages +BlockManagerMessages.BlockManagerHeartbeat +BlockManagerMessages.BlockManagerHeartbeat$ +BlockManagerMessages.GetBlockStatus +BlockManagerMessages.GetBlockStatus$ +BlockManagerMessages.GetExecutorEndpointRef +BlockManagerMessages.GetExecutorEndpointRef$ +BlockManagerMessages.GetLocations +BlockManagerMessages.GetLocations$ +BlockManagerMessages.GetLocationsMultipleBlockIds +BlockManagerMessages.GetLocationsMultipleBlockIds$ +BlockManagerMessages.GetMatchingBlockIds +BlockManagerMessages.GetMatchingBlockIds$ +BlockManagerMessages.GetMemoryStatus$ +BlockManagerMessages.GetPeers +BlockManagerMessages.GetPeers$ +BlockManagerMessages.GetStorageStatus$ +BlockManagerMessages.HasCachedBlocks +BlockManagerMessages.HasCachedBlocks$ +BlockManagerMessages.RegisterBlockManager +BlockManagerMessages.RegisterBlockManager$ +BlockManagerMessages.RemoveBlock +BlockManagerMessages.RemoveBlock$ +BlockManagerMessages.RemoveBroadcast +BlockManagerMessages.RemoveBroadcast$ +BlockManagerMessages.RemoveExecutor +BlockManagerMessages.RemoveExecutor$ +BlockManagerMessages.RemoveRdd +BlockManagerMessages.RemoveRdd$ +BlockManagerMessages.RemoveShuffle +BlockManagerMessages.RemoveShuffle$ +BlockManagerMessages.StopBlockManagerMaster$ +BlockManagerMessages.ToBlockManagerMaster +BlockManagerMessages.ToBlockManagerSlave +BlockManagerMessages.TriggerThreadDump$ +BlockManagerMessages.UpdateBlockInfo +BlockManagerMessages.UpdateBlockInfo$ +BlockMatrix +BlockNotFoundException +BlockStatus +BlockUpdatedInfo +BloomFilter +BloomFilter.Version +BooleanParam +BooleanType +BoostingStrategy +BoundedDouble +BreezeUtil +Broadcast +BroadcastBlockId +Broker +Bucketizer +BufferReleasingInputStream +BytecodeUtils +ByteType +CalendarIntervalType +Catalog +CatalogImpl +CatalystScan +CategoricalSplit +CausedBy +CheckpointReader +CheckpointState +ChiSqSelector +ChiSqSelector +ChiSqSelectorModel +ChiSqSelectorModel +ChiSqSelectorModel.SaveLoadV1_0$ +ChiSqTest +ChiSqTest.Method +ChiSqTest.Method$ +ChiSqTest.NullHypothesis$ +ChiSqTestResult +CholeskyDecomposition +ChunkedByteBufferInputStream +ClassificationModel +ClassificationModel +Classifier +CleanAccum +CleanBroadcast +CleanCheckpoint +CleanRDD +CleanShuffle +CleanupTask +CleanupTaskWeakReference +ClosureCleaner +CoarseGrainedClusterMessages +CoarseGrainedClusterMessages.AddWebUIFilter +CoarseGrainedClusterMessages.AddWebUIFilter$ +CoarseGrainedClusterMessages.GetExecutorLossReason +CoarseGrainedClusterMessages.GetExecutorLossReason$ +CoarseGrainedClusterMessages.KillExecutors +CoarseGrainedClusterMessages.KillExecutors$ +CoarseGrainedClusterMessages.KillTask +CoarseGrainedClusterMessages.KillTask$ +CoarseGrainedClusterMessages.LaunchTask +CoarseGrainedClusterMessages.LaunchTask$ +CoarseGrainedClusterMessages.RegisterClusterManager +CoarseGrainedClusterMessages.RegisterClusterManager$ +CoarseGrainedClusterMessages.RegisteredExecutor$ +CoarseGrainedClusterMessages.RegisterExecutor +CoarseGrainedClusterMessages.RegisterExecutor$ +CoarseGrainedClusterMessages.RegisterExecutorFailed +CoarseGrainedClusterMessages.RegisterExecutorFailed$
[44/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/dapply.html -- diff --git a/site/docs/2.0.2/api/R/dapply.html b/site/docs/2.0.2/api/R/dapply.html new file mode 100644 index 000..2bac687 --- /dev/null +++ b/site/docs/2.0.2/api/R/dapply.html @@ -0,0 +1,290 @@ + +R: dapply + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +dapply {SparkR}R Documentation + +dapply + +Description + +Apply a function to each partition of a SparkDataFrame. + + + +Usage + + +## S4 method for signature 'SparkDataFrame,'function',structType' +dapply(x, func, schema) + +dapply(x, func, schema) + + + +Arguments + + +x + +A SparkDataFrame + +func + +A function to be applied to each partition of the SparkDataFrame. +func should have only one parameter, to which a R data.frame corresponds +to each partition will be passed. +The output of func should be a R data.frame. + +schema + +The schema of the resulting SparkDataFrame after the function is applied. +It must match the output of func. + + + + +Note + +dapply since 2.0.0 + + + +See Also + +dapplyCollect + +Other SparkDataFrame functions: $, +$,SparkDataFrame-method, $-, +$-,SparkDataFrame-method, +select, select, +select,SparkDataFrame,Column-method, +select,SparkDataFrame,character-method, +select,SparkDataFrame,list-method; +SparkDataFrame-class; [, +[,SparkDataFrame-method, [[, +[[,SparkDataFrame,numericOrcharacter-method, +subset, subset, +subset,SparkDataFrame-method; +agg, agg, agg, +agg,GroupedData-method, +agg,SparkDataFrame-method, +summarize, summarize, +summarize, +summarize,GroupedData-method, +summarize,SparkDataFrame-method; +arrange, arrange, +arrange, +arrange,SparkDataFrame,Column-method, +arrange,SparkDataFrame,character-method, +orderBy,SparkDataFrame,characterOrColumn-method; +as.data.frame, +as.data.frame,SparkDataFrame-method; +attach, +attach,SparkDataFrame-method; +cache, cache, +cache,SparkDataFrame-method; +collect, collect, +collect,SparkDataFrame-method; +colnames, colnames, +colnames,SparkDataFrame-method, +colnames-, colnames-, +colnames-,SparkDataFrame-method, +columns, columns, +columns,SparkDataFrame-method, +names, +names,SparkDataFrame-method, +names-, +names-,SparkDataFrame-method; +coltypes, coltypes, +coltypes,SparkDataFrame-method, +coltypes-, coltypes-, +coltypes-,SparkDataFrame,character-method; +count,SparkDataFrame-method, +nrow, nrow, +nrow,SparkDataFrame-method; +createOrReplaceTempView, +createOrReplaceTempView, +createOrReplaceTempView,SparkDataFrame,character-method; +dapplyCollect, dapplyCollect, +dapplyCollect,SparkDataFrame,function-method; +describe, describe, +describe, +describe,SparkDataFrame,ANY-method, +describe,SparkDataFrame,character-method, +describe,SparkDataFrame-method, +summary, summary, +summary,SparkDataFrame-method; +dim, +dim,SparkDataFrame-method; +distinct, distinct, +distinct,SparkDataFrame-method, +unique, +unique,SparkDataFrame-method; +dropDuplicates, +dropDuplicates, +dropDuplicates,SparkDataFrame-method; +dropna, dropna, +dropna,SparkDataFrame-method, +fillna, fillna, +fillna,SparkDataFrame-method, +na.omit, na.omit, +na.omit,SparkDataFrame-method; +drop, drop, +drop, drop,ANY-method, +drop,SparkDataFrame-method; +dtypes, dtypes, +dtypes,SparkDataFrame-method; +except, except, +except,SparkDataFrame,SparkDataFrame-method; +explain, explain, +explain,SparkDataFrame-method; +filter, filter, +filter,SparkDataFrame,characterOrColumn-method, +where, where, +where,SparkDataFrame,characterOrColumn-method; +first, first, +first, +first,SparkDataFrame-method, +first,characterOrColumn-method; +gapplyCollect, gapplyCollect, +gapplyCollect, +gapplyCollect,GroupedData-method, +gapplyCollect,SparkDataFrame-method; +gapply, gapply, +gapply, +gapply,GroupedData-method, +gapply,SparkDataFrame-method; +groupBy, groupBy, +groupBy,SparkDataFrame-method, +group_by, group_by, +group_by,SparkDataFrame-method; +head, +head,SparkDataFrame-method; +histogram, +histogram,SparkDataFrame,characterOrColumn-method; +insertInto, insertInto, +insertInto,SparkDataFrame,character-method; +intersect, intersect, +intersect,SparkDataFrame,SparkDataFrame-method; +isLocal, isLocal, +isLocal,SparkDataFrame-method; +join, +join,SparkDataFrame,SparkDataFrame-method; +limit, limit, +limit,SparkDataFrame,numeric-method; +merge, merge, +merge,SparkDataFrame,SparkDataFrame-method; +mutate, mutate, +mutate,SparkDataFrame-method, +transform, transform, +transform,SparkDataFrame-method; +ncol, +ncol,SparkDataFrame-method; +persist, persist, +persist,SparkDataFrame,character-method; +printSchema, printSchema, +printSchema,SparkDataFrame-method; +randomSplit, randomSplit,
[47/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/avg.html -- diff --git a/site/docs/2.0.2/api/R/avg.html b/site/docs/2.0.2/api/R/avg.html new file mode 100644 index 000..b146502 --- /dev/null +++ b/site/docs/2.0.2/api/R/avg.html @@ -0,0 +1,109 @@ + +R: avg + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +avg {SparkR}R Documentation + +avg + +Description + +Aggregate function: returns the average of the values in a group. + + + +Usage + + +## S4 method for signature 'Column' +avg(x) + +avg(x, ...) + + + +Arguments + + +x + +Column to compute on or a GroupedData object. + +... + +additional argument(s) when x is a GroupedData object. + + + + +Note + +avg since 1.4.0 + + + +See Also + +Other agg_funcs: agg, agg, +agg, agg,GroupedData-method, +agg,SparkDataFrame-method, +summarize, summarize, +summarize, +summarize,GroupedData-method, +summarize,SparkDataFrame-method; +countDistinct, countDistinct, +countDistinct,Column-method, +n_distinct, n_distinct, +n_distinct,Column-method; +count, count, +count,Column-method, +count,GroupedData-method, n, +n, n,Column-method; +first, first, +first, +first,SparkDataFrame-method, +first,characterOrColumn-method; +kurtosis, kurtosis, +kurtosis,Column-method; last, +last, +last,characterOrColumn-method; +max, max,Column-method; +mean, mean,Column-method; +min, min,Column-method; +sd, sd, +sd,Column-method, stddev, +stddev, stddev,Column-method; +skewness, skewness, +skewness,Column-method; +stddev_pop, stddev_pop, +stddev_pop,Column-method; +stddev_samp, stddev_samp, +stddev_samp,Column-method; +sumDistinct, sumDistinct, +sumDistinct,Column-method; +sum, sum,Column-method; +var_pop, var_pop, +var_pop,Column-method; +var_samp, var_samp, +var_samp,Column-method; var, +var, var,Column-method, +variance, variance, +variance,Column-method + + + +Examples + +## Not run: avg(df$c) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/base64.html -- diff --git a/site/docs/2.0.2/api/R/base64.html b/site/docs/2.0.2/api/R/base64.html new file mode 100644 index 000..736a1fd --- /dev/null +++ b/site/docs/2.0.2/api/R/base64.html @@ -0,0 +1,115 @@ + +R: base64 + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +base64 {SparkR}R Documentation + +base64 + +Description + +Computes the BASE64 encoding of a binary column and returns it as a string column. +This is the reverse of unbase64. + + + +Usage + + +## S4 method for signature 'Column' +base64(x) + +base64(x) + + + +Arguments + + +x + +Column to compute on. + + + + +Note + +base64 since 1.5.0 + + + +See Also + +Other string_funcs: ascii, +ascii, ascii,Column-method; +concat_ws, concat_ws, +concat_ws,character,Column-method; +concat, concat, +concat,Column-method; decode, +decode, +decode,Column,character-method; +encode, encode, +encode,Column,character-method; +format_number, format_number, +format_number,Column,numeric-method; +format_string, format_string, +format_string,character,Column-method; +initcap, initcap, +initcap,Column-method; instr, +instr, +instr,Column,character-method; +length, length,Column-method; +levenshtein, levenshtein, +levenshtein,Column-method; +locate, locate, +locate,character,Column-method; +lower, lower, +lower,Column-method; lpad, +lpad, +lpad,Column,numeric,character-method; +ltrim, ltrim, +ltrim,Column-method; +regexp_extract, +regexp_extract, +regexp_extract,Column,character,numeric-method; +regexp_replace, +regexp_replace, +regexp_replace,Column,character,character-method; +reverse, reverse, +reverse,Column-method; rpad, +rpad, +rpad,Column,numeric,character-method; +rtrim, rtrim, +rtrim,Column-method; soundex, +soundex, +soundex,Column-method; +substring_index, +substring_index, +substring_index,Column,character,numeric-method; +translate, translate, +translate,Column,character,character-method; +trim, trim, +trim,Column-method; unbase64, +unbase64, +unbase64,Column-method; +upper, upper, +upper,Column-method + + + +Examples + +## Not run: base64(df$c) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/between.html -- diff --git a/site/docs/2.0.2/api/R/between.html b/site/docs/2.0.2/api/R/between.html
[15/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/Accumulable.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/Accumulable.html b/site/docs/2.0.2/api/java/org/apache/spark/Accumulable.html new file mode 100644 index 000..742e046 --- /dev/null +++ b/site/docs/2.0.2/api/java/org/apache/spark/Accumulable.html @@ -0,0 +1,456 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +Accumulable (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + +org.apache.spark +Class AccumulableR,T + + + +Object + + +org.apache.spark.AccumulableR,T + + + + + + + +All Implemented Interfaces: +java.io.Serializable + + +Direct Known Subclasses: +Accumulator + + +Deprecated. +use AccumulatorV2. Since 2.0.0. + + +public class AccumulableR,T +extends Object +implements java.io.Serializable +A data type that can be accumulated, i.e. has a commutative and associative "add" operation, + but where the result type, R, may be different from the element type being added, T. + + You must define how to add data, and how to merge two of these together. For some data types, + such as a counter, these might be the same operation. In that case, you can use the simpler + Accumulator. They won't always be the same, though -- e.g., imagine you are + accumulating a set. You will add items to the set, and you will union two sets together. + + Operations are not thread-safe. + + param: id ID of this accumulator; for internal use only. + param: initialValue initial value of accumulator + param: param helper object defining how to add elements of type R and T + param: name human-readable name for use in Spark's web UI + param: countFailedValues whether to accumulate values from failed tasks. This is set to true + for system and time metrics like serialization time or bytes spilled, + and false for things with absolute values like number of input rows. + This should be used for internal metrics only. +See Also:Serialized Form + + + + + + + + + + + +Constructor Summary + +Constructors + +Constructor and Description + + +Accumulable(RinitialValue, + AccumulableParamR,Tparam) +Deprecated. + + + + + + + + + + +Method Summary + +Methods + +Modifier and Type +Method and Description + + +void +add(Tterm) +Deprecated. +Add more data to this accumulator / accumulable + + + +long +id() +Deprecated. + + + +R +localValue() +Deprecated. +Get the current value of this accumulator from within a task. + + + +void +merge(Rterm) +Deprecated. +Merge two accumulable objects together + + + +scala.OptionString +name() +Deprecated. + + + +void +setValue(RnewValue) +Deprecated. +Set the accumulator's value. + + + +String +toString() +Deprecated. + + + +R +value() +Deprecated. +Access the accumulator's current value; only allowed on driver. + + + +R +zero() +Deprecated. + + + + + + + +Methods inherited from classObject +equals, getClass, hashCode, notify, notifyAll, wait, wait, wait + + + + + + + + + + + + + + +Constructor Detail + + + + + + + +Accumulable +publicAccumulable(RinitialValue, + AccumulableParamR,Tparam) +Deprecated. + + + + + + + + + +Method Detail + + + + + +id +publiclongid() +Deprecated. + + + + + + + +name +publicscala.OptionStringname() +Deprecated. + + + + + + + +zero +publicRzero() +Deprecated. + + + + + + + + + +add +publicvoidadd(Tterm) +Deprecated. +Add more data to this accumulator / accumulable +Parameters:term - the data to add + + + + + + + + + +merge +publicvoidmerge(Rterm) +Deprecated. +Merge two accumulable objects together + + Normally, a user will not want to use this version, but will instead call add. +Parameters:term - the other R that will get merged with this + + + + + + + +value +publicRvalue() +Deprecated. +Access the accumulator's current value; only allowed on driver. +Returns:(undocumented) + + + + + + + +localValue +publicRlocalValue() +Deprecated. +Get the current value of this accumulator from within a task. + + This is NOT the global value of the accumulator. To get the global value after a + completed operation on the dataset, call value. + + The typical use of this method is to directly mutate the local value, eg., to add + an element
[18/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/index-all.html -- diff --git a/site/docs/2.0.2/api/java/index-all.html b/site/docs/2.0.2/api/java/index-all.html new file mode 100644 index 000..12e185e --- /dev/null +++ b/site/docs/2.0.2/api/java/index-all.html @@ -0,0 +1,45389 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +Index (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev +Next + + +Frames +No Frames + + +All Classes + + + + + + + + + +$ABCDEFGHIJKLMNOPQRSTUVWXYZ_ + + +$ + +$colon$bslash(B, Function2A, B, B) - Static method in class org.apache.spark.sql.types.StructType + +$colon$plus(B, CanBuildFromRepr, B, That) - Static method in class org.apache.spark.sql.types.StructType + +$div$colon(B, Function2B, A, B) - Static method in class org.apache.spark.sql.types.StructType + +$greater(A) - Static method in class org.apache.spark.sql.types.Decimal + +$greater(A) - Static method in class org.apache.spark.storage.RDDInfo + +$greater$eq(A) - Static method in class org.apache.spark.sql.types.Decimal + +$greater$eq(A) - Static method in class org.apache.spark.storage.RDDInfo + +$less(A) - Static method in class org.apache.spark.sql.types.Decimal + +$less(A) - Static method in class org.apache.spark.storage.RDDInfo + +$less$eq(A) - Static method in class org.apache.spark.sql.types.Decimal + +$less$eq(A) - Static method in class org.apache.spark.storage.RDDInfo + +$minus$greater(T) - Static method in class org.apache.spark.ml.param.DoubleParam + +$minus$greater(T) - Static method in class org.apache.spark.ml.param.FloatParam + +$plus$colon(B, CanBuildFromRepr, B, That) - Static method in class org.apache.spark.sql.types.StructType + +$plus$eq(T) - Static method in class org.apache.spark.Accumulator + +Deprecated. + +$plus$plus(RDDT) - Static method in class org.apache.spark.api.r.RRDD + +$plus$plus(RDDT) - Static method in class org.apache.spark.graphx.EdgeRDD + +$plus$plus(RDDT) - Static method in class org.apache.spark.graphx.impl.EdgeRDDImpl + +$plus$plus(RDDT) - Static method in class org.apache.spark.graphx.impl.VertexRDDImpl + +$plus$plus(RDDT) - Static method in class org.apache.spark.graphx.VertexRDD + +$plus$plus(RDDT) - Static method in class org.apache.spark.rdd.HadoopRDD + +$plus$plus(RDDT) - Static method in class org.apache.spark.rdd.JdbcRDD + +$plus$plus(RDDT) - Static method in class org.apache.spark.rdd.NewHadoopRDD + +$plus$plus(RDDT) - Static method in class org.apache.spark.rdd.PartitionPruningRDD + +$plus$plus(RDDT) - Static method in class org.apache.spark.rdd.UnionRDD + +$plus$plus(GenTraversableOnceB, CanBuildFromRepr, B, That) - Static method in class org.apache.spark.sql.types.StructType + +$plus$plus$colon(TraversableOnceB, CanBuildFromRepr, B, That) - Static method in class org.apache.spark.sql.types.StructType + +$plus$plus$colon(TraversableB, CanBuildFromRepr, B, That) - Static method in class org.apache.spark.sql.types.StructType + +$plus$plus$eq(R) - Static method in class org.apache.spark.Accumulator + +Deprecated. + + + + + +A + +abs(Column) - Static method in class org.apache.spark.sql.functions + +Computes the absolute value. + +abs() - Method in class org.apache.spark.sql.types.Decimal + +absent() - Static method in class org.apache.spark.api.java.Optional + +AbsoluteError - Class in org.apache.spark.mllib.tree.loss + +:: DeveloperApi :: + Class for absolute error loss calculation (for regression). + +AbsoluteError() - Constructor for class org.apache.spark.mllib.tree.loss.AbsoluteError + +accept(Parsers) - Static method in class org.apache.spark.ml.feature.RFormulaParser + +accept(ES, Function1ES, ListObject) - Static method in class org.apache.spark.ml.feature.RFormulaParser + +accept(String, PartialFunctionObject, U) - Static method in class org.apache.spark.ml.feature.RFormulaParser + +acceptIf(Function1Object, Object, Function1Object, String) - Static method in class org.apache.spark.ml.feature.RFormulaParser + +acceptMatch(String, PartialFunctionObject, U) - Static method in class org.apache.spark.ml.feature.RFormulaParser + +acceptSeq(ES, Function1ES, IterableObject) - Static method in class org.apache.spark.ml.feature.RFormulaParser + +accId() - Method in class org.apache.spark.CleanAccum + +AccumulableR,T - Class in org.apache.spark + +Deprecated. +use AccumulatorV2. Since 2.0.0. + + +Accumulable(R, AccumulableParamR, T) -
[41/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/factorial.html -- diff --git a/site/docs/2.0.2/api/R/factorial.html b/site/docs/2.0.2/api/R/factorial.html new file mode 100644 index 000..b72dc6b --- /dev/null +++ b/site/docs/2.0.2/api/R/factorial.html @@ -0,0 +1,119 @@ + +R: factorial + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +factorial {SparkR}R Documentation + +factorial + +Description + +Computes the factorial of the given value. + + + +Usage + + +## S4 method for signature 'Column' +factorial(x) + + + +Arguments + + +x + +Column to compute on. + + + + +Note + +factorial since 1.5.0 + + + +See Also + +Other math_funcs: acos, +acos,Column-method; asin, +asin,Column-method; atan2, +atan2,Column-method; atan, +atan,Column-method; bin, +bin, bin,Column-method; +bround, bround, +bround,Column-method; cbrt, +cbrt, cbrt,Column-method; +ceil, ceil, +ceil,Column-method, ceiling, +ceiling,Column-method; conv, +conv, +conv,Column,numeric,numeric-method; +corr, corr, +corr, corr,Column-method, +corr,SparkDataFrame-method; +cosh, cosh,Column-method; +cos, cos,Column-method; +covar_pop, covar_pop, +covar_pop,characterOrColumn,characterOrColumn-method; +cov, cov, cov, +cov,SparkDataFrame-method, +cov,characterOrColumn-method, +covar_samp, covar_samp, +covar_samp,characterOrColumn,characterOrColumn-method; +expm1, expm1,Column-method; +exp, exp,Column-method; +floor, floor,Column-method; +hex, hex, +hex,Column-method; hypot, +hypot, hypot,Column-method; +log10, log10,Column-method; +log1p, log1p,Column-method; +log2, log2,Column-method; +log, log,Column-method; +pmod, pmod, +pmod,Column-method; rint, +rint, rint,Column-method; +round, round,Column-method; +shiftLeft, shiftLeft, +shiftLeft,Column,numeric-method; +shiftRightUnsigned, +shiftRightUnsigned, +shiftRightUnsigned,Column,numeric-method; +shiftRight, shiftRight, +shiftRight,Column,numeric-method; +sign, sign,Column-method, +signum, signum, +signum,Column-method; sinh, +sinh,Column-method; sin, +sin,Column-method; sqrt, +sqrt,Column-method; tanh, +tanh,Column-method; tan, +tan,Column-method; toDegrees, +toDegrees, +toDegrees,Column-method; +toRadians, toRadians, +toRadians,Column-method; +unhex, unhex, +unhex,Column-method + + + +Examples + +## Not run: factorial(df$c) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/filter.html -- diff --git a/site/docs/2.0.2/api/R/filter.html b/site/docs/2.0.2/api/R/filter.html new file mode 100644 index 000..eed100d --- /dev/null +++ b/site/docs/2.0.2/api/R/filter.html @@ -0,0 +1,288 @@ + +R: Filter + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +filter {SparkR}R Documentation + +Filter + +Description + +Filter the rows of a SparkDataFrame according to a given condition. + + + +Usage + + +## S4 method for signature 'SparkDataFrame,characterOrColumn' +filter(x, condition) + +## S4 method for signature 'SparkDataFrame,characterOrColumn' +where(x, condition) + +filter(x, condition) + +where(x, condition) + + + +Arguments + + +x + +A SparkDataFrame to be sorted. + +condition + +The condition to filter on. This may either be a Column expression +or a string containing a SQL statement + + + + +Value + +A SparkDataFrame containing only the rows that meet the condition. + + + +Note + +filter since 1.4.0 + +where since 1.4.0 + + + +See Also + +Other SparkDataFrame functions: $, +$,SparkDataFrame-method, $-, +$-,SparkDataFrame-method, +select, select, +select,SparkDataFrame,Column-method, +select,SparkDataFrame,character-method, +select,SparkDataFrame,list-method; +SparkDataFrame-class; [, +[,SparkDataFrame-method, [[, +[[,SparkDataFrame,numericOrcharacter-method, +subset, subset, +subset,SparkDataFrame-method; +agg, agg, agg, +agg,GroupedData-method, +agg,SparkDataFrame-method, +summarize, summarize, +summarize, +summarize,GroupedData-method, +summarize,SparkDataFrame-method; +arrange, arrange, +arrange, +arrange,SparkDataFrame,Column-method, +arrange,SparkDataFrame,character-method, +orderBy,SparkDataFrame,characterOrColumn-method; +as.data.frame, +as.data.frame,SparkDataFrame-method; +attach, +attach,SparkDataFrame-method; +cache, cache, +cache,SparkDataFrame-method; +collect, collect, +collect,SparkDataFrame-method; +colnames, colnames, +colnames,SparkDataFrame-method,
[29/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/show.html -- diff --git a/site/docs/2.0.2/api/R/show.html b/site/docs/2.0.2/api/R/show.html new file mode 100644 index 000..e6d3735 --- /dev/null +++ b/site/docs/2.0.2/api/R/show.html @@ -0,0 +1,269 @@ + +R: show + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +show {SparkR}R Documentation + +show + +Description + +Print class and type information of a Spark object. + + + +Usage + + +## S4 method for signature 'SparkDataFrame' +show(object) + +## S4 method for signature 'WindowSpec' +show(object) + +## S4 method for signature 'Column' +show(object) + +## S4 method for signature 'GroupedData' +show(object) + + + +Arguments + + +object + +a Spark object. Can be a SparkDataFrame, Column, GroupedData, WindowSpec. + + + + +Note + +show(SparkDataFrame) since 1.4.0 + +show(WindowSpec) since 2.0.0 + +show(Column) since 1.4.0 + +show(GroupedData) since 1.4.0 + + + +See Also + +Other SparkDataFrame functions: $, +$,SparkDataFrame-method, $-, +$-,SparkDataFrame-method, +select, select, +select,SparkDataFrame,Column-method, +select,SparkDataFrame,character-method, +select,SparkDataFrame,list-method; +SparkDataFrame-class; [, +[,SparkDataFrame-method, [[, +[[,SparkDataFrame,numericOrcharacter-method, +subset, subset, +subset,SparkDataFrame-method; +agg, agg, agg, +agg,GroupedData-method, +agg,SparkDataFrame-method, +summarize, summarize, +summarize, +summarize,GroupedData-method, +summarize,SparkDataFrame-method; +arrange, arrange, +arrange, +arrange,SparkDataFrame,Column-method, +arrange,SparkDataFrame,character-method, +orderBy,SparkDataFrame,characterOrColumn-method; +as.data.frame, +as.data.frame,SparkDataFrame-method; +attach, +attach,SparkDataFrame-method; +cache, cache, +cache,SparkDataFrame-method; +collect, collect, +collect,SparkDataFrame-method; +colnames, colnames, +colnames,SparkDataFrame-method, +colnames-, colnames-, +colnames-,SparkDataFrame-method, +columns, columns, +columns,SparkDataFrame-method, +names, +names,SparkDataFrame-method, +names-, +names-,SparkDataFrame-method; +coltypes, coltypes, +coltypes,SparkDataFrame-method, +coltypes-, coltypes-, +coltypes-,SparkDataFrame,character-method; +count,SparkDataFrame-method, +nrow, nrow, +nrow,SparkDataFrame-method; +createOrReplaceTempView, +createOrReplaceTempView, +createOrReplaceTempView,SparkDataFrame,character-method; +dapplyCollect, dapplyCollect, +dapplyCollect,SparkDataFrame,function-method; +dapply, dapply, +dapply,SparkDataFrame,function,structType-method; +describe, describe, +describe, +describe,SparkDataFrame,ANY-method, +describe,SparkDataFrame,character-method, +describe,SparkDataFrame-method, +summary, summary, +summary,SparkDataFrame-method; +dim, +dim,SparkDataFrame-method; +distinct, distinct, +distinct,SparkDataFrame-method, +unique, +unique,SparkDataFrame-method; +dropDuplicates, +dropDuplicates, +dropDuplicates,SparkDataFrame-method; +dropna, dropna, +dropna,SparkDataFrame-method, +fillna, fillna, +fillna,SparkDataFrame-method, +na.omit, na.omit, +na.omit,SparkDataFrame-method; +drop, drop, +drop, drop,ANY-method, +drop,SparkDataFrame-method; +dtypes, dtypes, +dtypes,SparkDataFrame-method; +except, except, +except,SparkDataFrame,SparkDataFrame-method; +explain, explain, +explain,SparkDataFrame-method; +filter, filter, +filter,SparkDataFrame,characterOrColumn-method, +where, where, +where,SparkDataFrame,characterOrColumn-method; +first, first, +first, +first,SparkDataFrame-method, +first,characterOrColumn-method; +gapplyCollect, gapplyCollect, +gapplyCollect, +gapplyCollect,GroupedData-method, +gapplyCollect,SparkDataFrame-method; +gapply, gapply, +gapply, +gapply,GroupedData-method, +gapply,SparkDataFrame-method; +groupBy, groupBy, +groupBy,SparkDataFrame-method, +group_by, group_by, +group_by,SparkDataFrame-method; +head, +head,SparkDataFrame-method; +histogram, +histogram,SparkDataFrame,characterOrColumn-method; +insertInto, insertInto, +insertInto,SparkDataFrame,character-method; +intersect, intersect, +intersect,SparkDataFrame,SparkDataFrame-method; +isLocal, isLocal, +isLocal,SparkDataFrame-method; +join, +join,SparkDataFrame,SparkDataFrame-method; +limit, limit, +limit,SparkDataFrame,numeric-method; +merge, merge, +merge,SparkDataFrame,SparkDataFrame-method; +mutate, mutate, +mutate,SparkDataFrame-method, +transform, transform, +transform,SparkDataFrame-method; +ncol, +ncol,SparkDataFrame-method; +persist, persist, +persist,SparkDataFrame,character-method; +printSchema, printSchema, +printSchema,SparkDataFrame-method; +randomSplit, randomSplit, +randomSplit,SparkDataFrame,numeric-method; +rbind, rbind,
[36/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/lower.html -- diff --git a/site/docs/2.0.2/api/R/lower.html b/site/docs/2.0.2/api/R/lower.html new file mode 100644 index 000..8122f5f --- /dev/null +++ b/site/docs/2.0.2/api/R/lower.html @@ -0,0 +1,114 @@ + +R: lower + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +lower {SparkR}R Documentation + +lower + +Description + +Converts a string column to lower case. + + + +Usage + + +## S4 method for signature 'Column' +lower(x) + +lower(x) + + + +Arguments + + +x + +Column to compute on. + + + + +Note + +lower since 1.4.0 + + + +See Also + +Other string_funcs: ascii, +ascii, ascii,Column-method; +base64, base64, +base64,Column-method; +concat_ws, concat_ws, +concat_ws,character,Column-method; +concat, concat, +concat,Column-method; decode, +decode, +decode,Column,character-method; +encode, encode, +encode,Column,character-method; +format_number, format_number, +format_number,Column,numeric-method; +format_string, format_string, +format_string,character,Column-method; +initcap, initcap, +initcap,Column-method; instr, +instr, +instr,Column,character-method; +length, length,Column-method; +levenshtein, levenshtein, +levenshtein,Column-method; +locate, locate, +locate,character,Column-method; +lpad, lpad, +lpad,Column,numeric,character-method; +ltrim, ltrim, +ltrim,Column-method; +regexp_extract, +regexp_extract, +regexp_extract,Column,character,numeric-method; +regexp_replace, +regexp_replace, +regexp_replace,Column,character,character-method; +reverse, reverse, +reverse,Column-method; rpad, +rpad, +rpad,Column,numeric,character-method; +rtrim, rtrim, +rtrim,Column-method; soundex, +soundex, +soundex,Column-method; +substring_index, +substring_index, +substring_index,Column,character,numeric-method; +translate, translate, +translate,Column,character,character-method; +trim, trim, +trim,Column-method; unbase64, +unbase64, +unbase64,Column-method; +upper, upper, +upper,Column-method + + + +Examples + +## Not run: lower(df$c) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/lpad.html -- diff --git a/site/docs/2.0.2/api/R/lpad.html b/site/docs/2.0.2/api/R/lpad.html new file mode 100644 index 000..ea0f899 --- /dev/null +++ b/site/docs/2.0.2/api/R/lpad.html @@ -0,0 +1,121 @@ + +R: lpad + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +lpad {SparkR}R Documentation + +lpad + +Description + +Left-pad the string column with + + + +Usage + + +## S4 method for signature 'Column,numeric,character' +lpad(x, len, pad) + +lpad(x, len, pad) + + + +Arguments + + +x + +the string Column to be left-padded. + +len + +maximum length of each output result. + +pad + +a character string to be padded with. + + + + +Note + +lpad since 1.5.0 + + + +See Also + +Other string_funcs: ascii, +ascii, ascii,Column-method; +base64, base64, +base64,Column-method; +concat_ws, concat_ws, +concat_ws,character,Column-method; +concat, concat, +concat,Column-method; decode, +decode, +decode,Column,character-method; +encode, encode, +encode,Column,character-method; +format_number, format_number, +format_number,Column,numeric-method; +format_string, format_string, +format_string,character,Column-method; +initcap, initcap, +initcap,Column-method; instr, +instr, +instr,Column,character-method; +length, length,Column-method; +levenshtein, levenshtein, +levenshtein,Column-method; +locate, locate, +locate,character,Column-method; +lower, lower, +lower,Column-method; ltrim, +ltrim, ltrim,Column-method; +regexp_extract, +regexp_extract, +regexp_extract,Column,character,numeric-method; +regexp_replace, +regexp_replace, +regexp_replace,Column,character,character-method; +reverse, reverse, +reverse,Column-method; rpad, +rpad, +rpad,Column,numeric,character-method; +rtrim, rtrim, +rtrim,Column-method; soundex, +soundex, +soundex,Column-method; +substring_index, +substring_index, +substring_index,Column,character,numeric-method; +translate, translate, +translate,Column,character,character-method; +trim, trim, +trim,Column-method; unbase64, +unbase64, +unbase64,Column-method; +upper, upper, +upper,Column-method + + + +Examples + +## Not run: lpad(df$c, 6, #) + + + +[Package SparkR version 2.0.2 Index] +
[16/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/lib/jquery.js -- diff --git a/site/docs/2.0.2/api/java/lib/jquery.js b/site/docs/2.0.2/api/java/lib/jquery.js new file mode 100644 index 000..bc3fbc8 --- /dev/null +++ b/site/docs/2.0.2/api/java/lib/jquery.js @@ -0,0 +1,2 @@ +/*! jQuery v1.8.2 jquery.com | jquery.org/license */ +(function(a,b){function G(a){var b=F[a]={};return p.each(a.split(s),function(a,c){b[c]=!0}),b}function J(a,c,d){if(d===b&===1){var e="data-"+c.replace(I,"-$1").toLowerCase();d=a.getAttribute(e);if(typeof d=="string"){try{d=d==="true"?!0:d==="false"?!1:d==="null"?null:+d+""===d?+d:H.test(d)?p.parseJSON(d):d}catch(f){}p.data(a,c,d)}else d=b}return d}function K(a){var b;for(b in a){if(b==="data"&(a[b]))continue;if(b!=="toJSON")return!1}return!0}function ba(){return!1}function bb(){return!0}function bh(a){return!a||!a.parentNode||a.parentNode.nodeType===11}function bi(a,b){do a=a[b];while(a&!==1);return a}function bj(a,b,c){b=b||0;if(p.isFunction(b))return p.grep(a,function(a,d){var e=!!b.call(a,d,a);return e===c});if(b.nodeType)return p.grep(a,function(a,d){return a===b===c});if(typeof b=="string"){var d=p.grep(a,function(a){return a.nodeType===1});if(be.test(b))return p.filter(b,d,!c);b=p.filter(b,d)}return p.grep(a,function(a,d){return p.inArray( a,b)>=0===c})}function bk(a){var b=bl.split("|"),c=a.createDocumentFragment();if(c.createElement)while(b.length)c.createElement(b.pop());return c}function bC(a,b){return a.getElementsByTagName(b)[0]||a.appendChild(a.ownerDocument.createElement(b))}function bD(a,b){if(b.nodeType!==1||!p.hasData(a))return;var c,d,e,f=p._data(a),g=p._data(b,f),h=f.events;if(h){delete g.handle,g.events={};for(c in h)for(d=0,e=h[c].length;d").appendTo(e.body),c=b.css("display");b.remove();if(c==="none"||c===""){bI=e.body.appendChild(bI||p.extend(e.createElement("iframe"),{frameBorder:0,width:0,height:0}));if(!bJ||!bI. createElement)bJ=(bI.contentWindow||bI.contentDocument).document,bJ.write(""),bJ.close();b=bJ.body.appendChild(bJ.createElement(a)),c=bH(b,"display"),e.body.removeChild(bI)}return bS[a]=c,c}function ci(a,b,c,d){var e;if(p.isArray(b))p.each(b,function(b,e){c||ce.test(a)?d(a,e):ci(a+"["+(typeof e=="object"?b:"")+"]",e,c,d)});else if(!c&(b)==="object")for(e in b)ci(a+"["+e+"]",b[e],c,d);else d(a,b)}function cz(a){return function(b,c){typeof b!="string"&&(c=b,b="*");var
[10/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/InternalAccumulator.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/InternalAccumulator.html b/site/docs/2.0.2/api/java/org/apache/spark/InternalAccumulator.html new file mode 100644 index 000..d6fed2d --- /dev/null +++ b/site/docs/2.0.2/api/java/org/apache/spark/InternalAccumulator.html @@ -0,0 +1,469 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +InternalAccumulator (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + +org.apache.spark +Class InternalAccumulator + + + +Object + + +org.apache.spark.InternalAccumulator + + + + + + + + +public class InternalAccumulator +extends Object +A collection of fields and methods concerned with internal accumulators that represent + task level metrics. + + + + + + + + + + + +Nested Class Summary + +Nested Classes + +Modifier and Type +Class and Description + + +static class +InternalAccumulator.input$ + + +static class +InternalAccumulator.output$ + + +static class +InternalAccumulator.shuffleRead$ + + +static class +InternalAccumulator.shuffleWrite$ + + + + + + + + + +Constructor Summary + +Constructors + +Constructor and Description + + +InternalAccumulator() + + + + + + + + + +Method Summary + +Methods + +Modifier and Type +Method and Description + + +static String +DISK_BYTES_SPILLED() + + +static String +EXECUTOR_DESERIALIZE_TIME() + + +static String +EXECUTOR_RUN_TIME() + + +static String +INPUT_METRICS_PREFIX() + + +static String +JVM_GC_TIME() + + +static String +MEMORY_BYTES_SPILLED() + + +static String +METRICS_PREFIX() + + +static String +OUTPUT_METRICS_PREFIX() + + +static String +PEAK_EXECUTION_MEMORY() + + +static String +RESULT_SERIALIZATION_TIME() + + +static String +RESULT_SIZE() + + +static String +SHUFFLE_READ_METRICS_PREFIX() + + +static String +SHUFFLE_WRITE_METRICS_PREFIX() + + +static String +TEST_ACCUM() + + +static String +UPDATED_BLOCK_STATUSES() + + + + + + +Methods inherited from classObject +equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait + + + + + + + + + + + + + + +Constructor Detail + + + + + +InternalAccumulator +publicInternalAccumulator() + + + + + + + + + +Method Detail + + + + + +METRICS_PREFIX +public staticStringMETRICS_PREFIX() + + + + + + + +SHUFFLE_READ_METRICS_PREFIX +public staticStringSHUFFLE_READ_METRICS_PREFIX() + + + + + + + +SHUFFLE_WRITE_METRICS_PREFIX +public staticStringSHUFFLE_WRITE_METRICS_PREFIX() + + + + + + + +OUTPUT_METRICS_PREFIX +public staticStringOUTPUT_METRICS_PREFIX() + + + + + + + +INPUT_METRICS_PREFIX +public staticStringINPUT_METRICS_PREFIX() + + + + + + + +EXECUTOR_DESERIALIZE_TIME +public staticStringEXECUTOR_DESERIALIZE_TIME() + + + + + + + +EXECUTOR_RUN_TIME +public staticStringEXECUTOR_RUN_TIME() + + + + + + + +RESULT_SIZE +public staticStringRESULT_SIZE() + + + + + + + +JVM_GC_TIME +public staticStringJVM_GC_TIME() + + + + + + + +RESULT_SERIALIZATION_TIME +public staticStringRESULT_SERIALIZATION_TIME() + + + + + + + +MEMORY_BYTES_SPILLED +public staticStringMEMORY_BYTES_SPILLED() + + + + + + + +DISK_BYTES_SPILLED +public staticStringDISK_BYTES_SPILLED() + + + + + + + +PEAK_EXECUTION_MEMORY +public staticStringPEAK_EXECUTION_MEMORY() + + + + + + + +UPDATED_BLOCK_STATUSES +public staticStringUPDATED_BLOCK_STATUSES() + + + + + + + +TEST_ACCUM +public staticStringTEST_ACCUM() + + + + + + + + + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/InternalAccumulator.input$.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/InternalAccumulator.input$.html b/site/docs/2.0.2/api/java/org/apache/spark/InternalAccumulator.input$.html new file
[39/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/head.html -- diff --git a/site/docs/2.0.2/api/R/head.html b/site/docs/2.0.2/api/R/head.html new file mode 100644 index 000..ec3431b --- /dev/null +++ b/site/docs/2.0.2/api/R/head.html @@ -0,0 +1,267 @@ + +R: Head + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +head {SparkR}R Documentation + +Head + +Description + +Return the first num rows of a SparkDataFrame as a R data.frame. If num is not +specified, then head() returns the first 6 rows as with R data.frame. + + + +Usage + + +## S4 method for signature 'SparkDataFrame' +head(x, num = 6L) + + + +Arguments + + +x + +a SparkDataFrame. + +num + +the number of rows to return. Default is 6. + + + + +Value + +A data.frame. + + + +Note + +head since 1.4.0 + + + +See Also + +Other SparkDataFrame functions: $, +$,SparkDataFrame-method, $-, +$-,SparkDataFrame-method, +select, select, +select,SparkDataFrame,Column-method, +select,SparkDataFrame,character-method, +select,SparkDataFrame,list-method; +SparkDataFrame-class; [, +[,SparkDataFrame-method, [[, +[[,SparkDataFrame,numericOrcharacter-method, +subset, subset, +subset,SparkDataFrame-method; +agg, agg, agg, +agg,GroupedData-method, +agg,SparkDataFrame-method, +summarize, summarize, +summarize, +summarize,GroupedData-method, +summarize,SparkDataFrame-method; +arrange, arrange, +arrange, +arrange,SparkDataFrame,Column-method, +arrange,SparkDataFrame,character-method, +orderBy,SparkDataFrame,characterOrColumn-method; +as.data.frame, +as.data.frame,SparkDataFrame-method; +attach, +attach,SparkDataFrame-method; +cache, cache, +cache,SparkDataFrame-method; +collect, collect, +collect,SparkDataFrame-method; +colnames, colnames, +colnames,SparkDataFrame-method, +colnames-, colnames-, +colnames-,SparkDataFrame-method, +columns, columns, +columns,SparkDataFrame-method, +names, +names,SparkDataFrame-method, +names-, +names-,SparkDataFrame-method; +coltypes, coltypes, +coltypes,SparkDataFrame-method, +coltypes-, coltypes-, +coltypes-,SparkDataFrame,character-method; +count,SparkDataFrame-method, +nrow, nrow, +nrow,SparkDataFrame-method; +createOrReplaceTempView, +createOrReplaceTempView, +createOrReplaceTempView,SparkDataFrame,character-method; +dapplyCollect, dapplyCollect, +dapplyCollect,SparkDataFrame,function-method; +dapply, dapply, +dapply,SparkDataFrame,function,structType-method; +describe, describe, +describe, +describe,SparkDataFrame,ANY-method, +describe,SparkDataFrame,character-method, +describe,SparkDataFrame-method, +summary, summary, +summary,SparkDataFrame-method; +dim, +dim,SparkDataFrame-method; +distinct, distinct, +distinct,SparkDataFrame-method, +unique, +unique,SparkDataFrame-method; +dropDuplicates, +dropDuplicates, +dropDuplicates,SparkDataFrame-method; +dropna, dropna, +dropna,SparkDataFrame-method, +fillna, fillna, +fillna,SparkDataFrame-method, +na.omit, na.omit, +na.omit,SparkDataFrame-method; +drop, drop, +drop, drop,ANY-method, +drop,SparkDataFrame-method; +dtypes, dtypes, +dtypes,SparkDataFrame-method; +except, except, +except,SparkDataFrame,SparkDataFrame-method; +explain, explain, +explain,SparkDataFrame-method; +filter, filter, +filter,SparkDataFrame,characterOrColumn-method, +where, where, +where,SparkDataFrame,characterOrColumn-method; +first, first, +first, +first,SparkDataFrame-method, +first,characterOrColumn-method; +gapplyCollect, gapplyCollect, +gapplyCollect, +gapplyCollect,GroupedData-method, +gapplyCollect,SparkDataFrame-method; +gapply, gapply, +gapply, +gapply,GroupedData-method, +gapply,SparkDataFrame-method; +groupBy, groupBy, +groupBy,SparkDataFrame-method, +group_by, group_by, +group_by,SparkDataFrame-method; +histogram, +histogram,SparkDataFrame,characterOrColumn-method; +insertInto, insertInto, +insertInto,SparkDataFrame,character-method; +intersect, intersect, +intersect,SparkDataFrame,SparkDataFrame-method; +isLocal, isLocal, +isLocal,SparkDataFrame-method; +join, +join,SparkDataFrame,SparkDataFrame-method; +limit, limit, +limit,SparkDataFrame,numeric-method; +merge, merge, +merge,SparkDataFrame,SparkDataFrame-method; +mutate, mutate, +mutate,SparkDataFrame-method, +transform, transform, +transform,SparkDataFrame-method; +ncol, +ncol,SparkDataFrame-method; +persist, persist, +persist,SparkDataFrame,character-method; +printSchema, printSchema, +printSchema,SparkDataFrame-method; +randomSplit, randomSplit, +randomSplit,SparkDataFrame,numeric-method; +rbind, rbind, +rbind,SparkDataFrame-method; +registerTempTable, +registerTempTable, +registerTempTable,SparkDataFrame,character-method; +rename, rename, +rename,SparkDataFrame-method, +withColumnRenamed,
[33/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/printSchema.html -- diff --git a/site/docs/2.0.2/api/R/printSchema.html b/site/docs/2.0.2/api/R/printSchema.html new file mode 100644 index 000..b8846b6 --- /dev/null +++ b/site/docs/2.0.2/api/R/printSchema.html @@ -0,0 +1,258 @@ + +R: Print Schema of a SparkDataFrame + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +printSchema {SparkR}R Documentation + +Print Schema of a SparkDataFrame + +Description + +Prints out the schema in tree format + + + +Usage + + +## S4 method for signature 'SparkDataFrame' +printSchema(x) + +printSchema(x) + + + +Arguments + + +x + +A SparkDataFrame + + + + +Note + +printSchema since 1.4.0 + + + +See Also + +Other SparkDataFrame functions: $, +$,SparkDataFrame-method, $-, +$-,SparkDataFrame-method, +select, select, +select,SparkDataFrame,Column-method, +select,SparkDataFrame,character-method, +select,SparkDataFrame,list-method; +SparkDataFrame-class; [, +[,SparkDataFrame-method, [[, +[[,SparkDataFrame,numericOrcharacter-method, +subset, subset, +subset,SparkDataFrame-method; +agg, agg, agg, +agg,GroupedData-method, +agg,SparkDataFrame-method, +summarize, summarize, +summarize, +summarize,GroupedData-method, +summarize,SparkDataFrame-method; +arrange, arrange, +arrange, +arrange,SparkDataFrame,Column-method, +arrange,SparkDataFrame,character-method, +orderBy,SparkDataFrame,characterOrColumn-method; +as.data.frame, +as.data.frame,SparkDataFrame-method; +attach, +attach,SparkDataFrame-method; +cache, cache, +cache,SparkDataFrame-method; +collect, collect, +collect,SparkDataFrame-method; +colnames, colnames, +colnames,SparkDataFrame-method, +colnames-, colnames-, +colnames-,SparkDataFrame-method, +columns, columns, +columns,SparkDataFrame-method, +names, +names,SparkDataFrame-method, +names-, +names-,SparkDataFrame-method; +coltypes, coltypes, +coltypes,SparkDataFrame-method, +coltypes-, coltypes-, +coltypes-,SparkDataFrame,character-method; +count,SparkDataFrame-method, +nrow, nrow, +nrow,SparkDataFrame-method; +createOrReplaceTempView, +createOrReplaceTempView, +createOrReplaceTempView,SparkDataFrame,character-method; +dapplyCollect, dapplyCollect, +dapplyCollect,SparkDataFrame,function-method; +dapply, dapply, +dapply,SparkDataFrame,function,structType-method; +describe, describe, +describe, +describe,SparkDataFrame,ANY-method, +describe,SparkDataFrame,character-method, +describe,SparkDataFrame-method, +summary, summary, +summary,SparkDataFrame-method; +dim, +dim,SparkDataFrame-method; +distinct, distinct, +distinct,SparkDataFrame-method, +unique, +unique,SparkDataFrame-method; +dropDuplicates, +dropDuplicates, +dropDuplicates,SparkDataFrame-method; +dropna, dropna, +dropna,SparkDataFrame-method, +fillna, fillna, +fillna,SparkDataFrame-method, +na.omit, na.omit, +na.omit,SparkDataFrame-method; +drop, drop, +drop, drop,ANY-method, +drop,SparkDataFrame-method; +dtypes, dtypes, +dtypes,SparkDataFrame-method; +except, except, +except,SparkDataFrame,SparkDataFrame-method; +explain, explain, +explain,SparkDataFrame-method; +filter, filter, +filter,SparkDataFrame,characterOrColumn-method, +where, where, +where,SparkDataFrame,characterOrColumn-method; +first, first, +first, +first,SparkDataFrame-method, +first,characterOrColumn-method; +gapplyCollect, gapplyCollect, +gapplyCollect, +gapplyCollect,GroupedData-method, +gapplyCollect,SparkDataFrame-method; +gapply, gapply, +gapply, +gapply,GroupedData-method, +gapply,SparkDataFrame-method; +groupBy, groupBy, +groupBy,SparkDataFrame-method, +group_by, group_by, +group_by,SparkDataFrame-method; +head, +head,SparkDataFrame-method; +histogram, +histogram,SparkDataFrame,characterOrColumn-method; +insertInto, insertInto, +insertInto,SparkDataFrame,character-method; +intersect, intersect, +intersect,SparkDataFrame,SparkDataFrame-method; +isLocal, isLocal, +isLocal,SparkDataFrame-method; +join, +join,SparkDataFrame,SparkDataFrame-method; +limit, limit, +limit,SparkDataFrame,numeric-method; +merge, merge, +merge,SparkDataFrame,SparkDataFrame-method; +mutate, mutate, +mutate,SparkDataFrame-method, +transform, transform, +transform,SparkDataFrame-method; +ncol, +ncol,SparkDataFrame-method; +persist, persist, +persist,SparkDataFrame,character-method; +randomSplit, randomSplit, +randomSplit,SparkDataFrame,numeric-method; +rbind, rbind, +rbind,SparkDataFrame-method; +registerTempTable, +registerTempTable, +registerTempTable,SparkDataFrame,character-method; +rename, rename, +rename,SparkDataFrame-method, +withColumnRenamed, +withColumnRenamed, +withColumnRenamed,SparkDataFrame,character,character-method; +repartition, repartition,
[50/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/README.md -- diff --git a/site/docs/2.0.2/README.md b/site/docs/2.0.2/README.md new file mode 100644 index 000..ffd3b57 --- /dev/null +++ b/site/docs/2.0.2/README.md @@ -0,0 +1,72 @@ +Welcome to the Spark documentation! + +This readme will walk you through navigating and building the Spark documentation, which is included +here with the Spark source code. You can also find documentation specific to release versions of +Spark at http://spark.apache.org/documentation.html. + +Read on to learn more about viewing documentation in plain text (i.e., markdown) or building the +documentation yourself. Why build it yourself? So that you have the docs that corresponds to +whichever version of Spark you currently have checked out of revision control. + +## Prerequisites +The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, +Python and R. + +You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and +[Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python) +installed. Also install the following libraries: +```sh +$ sudo gem install jekyll jekyll-redirect-from pygments.rb +$ sudo pip install Pygments +# Following is needed only for generating API docs +$ sudo pip install sphinx pypandoc +$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "roxygen2", "testthat", "rmarkdown"), repos="http://cran.stat.ucla.edu/;)' +``` +(Note: If you are on a system with both Ruby 1.9 and Ruby 2.0 you may need to replace gem with gem2.0) + +## Generating the Documentation HTML + +We include the Spark documentation as part of the source (as opposed to using a hosted wiki, such as +the github wiki, as the definitive documentation) to enable the documentation to evolve along with +the source code and be captured by revision control (currently git). This way the code automatically +includes the version of the documentation that is relevant regardless of which version or release +you have checked out or downloaded. + +In this directory you will find textfiles formatted using Markdown, with an ".md" suffix. You can +read those text files directly if you want. Start with index.md. + +Execute `jekyll build` from the `docs/` directory to compile the site. Compiling the site with +Jekyll will create a directory called `_site` containing index.html as well as the rest of the +compiled files. + +$ cd docs +$ jekyll build + +You can modify the default Jekyll build as follows: +```sh +# Skip generating API docs (which takes a while) +$ SKIP_API=1 jekyll build + +# Serve content locally on port 4000 +$ jekyll serve --watch + +# Build the site with extra features used on the live page +$ PRODUCTION=1 jekyll build +``` + +## API Docs (Scaladoc, Sphinx, roxygen2) + +You can build just the Spark scaladoc by running `build/sbt unidoc` from the SPARK_PROJECT_ROOT directory. + +Similarly, you can build just the PySpark docs by running `make html` from the +SPARK_PROJECT_ROOT/python/docs directory. Documentation is only generated for classes that are listed as +public in `__init__.py`. The SparkR docs can be built by running SPARK_PROJECT_ROOT/R/create-docs.sh. + +When you run `jekyll` in the `docs` directory, it will also copy over the scaladoc for the various +Spark subprojects into the `docs` directory (and then also into the `_site` directory). We use a +jekyll plugin to run `build/sbt unidoc` before building the site so if you haven't run it (recently) it +may take some time as it generates all of the scaladoc. The jekyll plugin also generates the +PySpark docs using [Sphinx](http://sphinx-doc.org/). + +NOTE: To skip the step of building and copying over the Scala, Python, R API docs, run `SKIP_API=1 +jekyll`. http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api.html -- diff --git a/site/docs/2.0.2/api.html b/site/docs/2.0.2/api.html new file mode 100644 index 000..731bd07 --- /dev/null +++ b/site/docs/2.0.2/api.html @@ -0,0 +1,178 @@ + + + + + + + + + +Spark API Documentation - Spark 2.0.2 Documentation + + + + + + +body { +padding-top: 60px; +padding-bottom: 40px; +} + + + + + + + + + + + + + var _gaq = _gaq || []; + _gaq.push(['_setAccount', 'UA-32518208-2']); + _gaq.push(['_trackPageview']); + + (function() { +var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; +ga.src = ('https:' ==
[27/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/structField.html -- diff --git a/site/docs/2.0.2/api/R/structField.html b/site/docs/2.0.2/api/R/structField.html new file mode 100644 index 000..6325141 --- /dev/null +++ b/site/docs/2.0.2/api/R/structField.html @@ -0,0 +1,84 @@ + +R: structField + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +structField {SparkR}R Documentation + +structField + +Description + +Create a structField object that contains the metadata for a single field in a schema. + + + +Usage + + +structField(x, ...) + +## S3 method for class 'jobj' +structField(x, ...) + +## S3 method for class 'character' +structField(x, type, nullable = TRUE, ...) + + + +Arguments + + +x + +the name of the field. + +... + +additional argument(s) passed to the method. + +type + +The data type of the field + +nullable + +A logical vector indicating whether or not the field is nullable + + + + +Value + +A structField object. + + + +Note + +structField since 1.4.0 + + + +Examples + +## Not run: +##D field1 - structField(a, integer) +##D field2 - structField(c, string) +##D field3 - structField(avg, double) +##D schema - structType(field1, field2, field3) +##D df1 - gapply(df, list(a, c), +##D function(key, x) { y - data.frame(key, mean(x$b), stringsAsFactors = FALSE) }, +##D schema) +## End(Not run) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/structType.html -- diff --git a/site/docs/2.0.2/api/R/structType.html b/site/docs/2.0.2/api/R/structType.html new file mode 100644 index 000..d068b59 --- /dev/null +++ b/site/docs/2.0.2/api/R/structType.html @@ -0,0 +1,75 @@ + +R: structType + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +structType {SparkR}R Documentation + +structType + +Description + +Create a structType object that contains the metadata for a SparkDataFrame. Intended for +use with createDataFrame and toDF. + + + +Usage + + +structType(x, ...) + +## S3 method for class 'jobj' +structType(x, ...) + +## S3 method for class 'structField' +structType(x, ...) + + + +Arguments + + +x + +a structField object (created with the field() function) + +... + +additional structField objects + + + + +Value + +a structType object + + + +Note + +structType since 1.4.0 + + + +Examples + +## Not run: +##D schema - structType(structField(a, integer), structField(c, string), +##D structField(avg, double)) +##D df1 - gapply(df, list(a, c), +##D function(key, x) { y - data.frame(key, mean(x$b), stringsAsFactors = FALSE) }, +##D schema) +## End(Not run) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/subset.html -- diff --git a/site/docs/2.0.2/api/R/subset.html b/site/docs/2.0.2/api/R/subset.html new file mode 100644 index 000..987e20b --- /dev/null +++ b/site/docs/2.0.2/api/R/subset.html @@ -0,0 +1,309 @@ + +R: Subset + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +[[ {SparkR}R Documentation + +Subset + +Description + +Return subsets of SparkDataFrame according to given conditions + + + +Usage + + +## S4 method for signature 'SparkDataFrame,numericOrcharacter' +x[[i]] + +## S4 method for signature 'SparkDataFrame' +x[i, j, ..., drop = F] + +## S4 method for signature 'SparkDataFrame' +subset(x, subset, select, drop = F, ...) + +subset(x, ...) + + + +Arguments + + +x + +a SparkDataFrame. + +i,subset + +(Optional) a logical expression to filter on rows. + +j,select + +expression for the single Column or a list of columns to select from the SparkDataFrame. + +... + +currently not used. + +drop + +if TRUE, a Column will be returned if the resulting dataset has only one column. +Otherwise, a SparkDataFrame will always be returned. + + + + +Value + +A new SparkDataFrame containing only the rows that meet the condition with selected columns. + + + +Note + +[[ since 1.4.0 + +[
[46/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/collect.html -- diff --git a/site/docs/2.0.2/api/R/collect.html b/site/docs/2.0.2/api/R/collect.html new file mode 100644 index 000..61090a7 --- /dev/null +++ b/site/docs/2.0.2/api/R/collect.html @@ -0,0 +1,268 @@ + +R: Collects all the elements of a SparkDataFrame and coerces... + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +collect {SparkR}R Documentation + +Collects all the elements of a SparkDataFrame and coerces them into an R data.frame. + +Description + +Collects all the elements of a SparkDataFrame and coerces them into an R data.frame. + + + +Usage + + +## S4 method for signature 'SparkDataFrame' +collect(x, stringsAsFactors = FALSE) + +collect(x, ...) + + + +Arguments + + +x + +a SparkDataFrame. + +stringsAsFactors + +(Optional) a logical indicating whether or not string columns +should be converted to factors. FALSE by default. + +... + +further arguments to be passed to or from other methods. + + + + +Note + +collect since 1.4.0 + + + +See Also + +Other SparkDataFrame functions: $, +$,SparkDataFrame-method, $-, +$-,SparkDataFrame-method, +select, select, +select,SparkDataFrame,Column-method, +select,SparkDataFrame,character-method, +select,SparkDataFrame,list-method; +SparkDataFrame-class; [, +[,SparkDataFrame-method, [[, +[[,SparkDataFrame,numericOrcharacter-method, +subset, subset, +subset,SparkDataFrame-method; +agg, agg, agg, +agg,GroupedData-method, +agg,SparkDataFrame-method, +summarize, summarize, +summarize, +summarize,GroupedData-method, +summarize,SparkDataFrame-method; +arrange, arrange, +arrange, +arrange,SparkDataFrame,Column-method, +arrange,SparkDataFrame,character-method, +orderBy,SparkDataFrame,characterOrColumn-method; +as.data.frame, +as.data.frame,SparkDataFrame-method; +attach, +attach,SparkDataFrame-method; +cache, cache, +cache,SparkDataFrame-method; +colnames, colnames, +colnames,SparkDataFrame-method, +colnames-, colnames-, +colnames-,SparkDataFrame-method, +columns, columns, +columns,SparkDataFrame-method, +names, +names,SparkDataFrame-method, +names-, +names-,SparkDataFrame-method; +coltypes, coltypes, +coltypes,SparkDataFrame-method, +coltypes-, coltypes-, +coltypes-,SparkDataFrame,character-method; +count,SparkDataFrame-method, +nrow, nrow, +nrow,SparkDataFrame-method; +createOrReplaceTempView, +createOrReplaceTempView, +createOrReplaceTempView,SparkDataFrame,character-method; +dapplyCollect, dapplyCollect, +dapplyCollect,SparkDataFrame,function-method; +dapply, dapply, +dapply,SparkDataFrame,function,structType-method; +describe, describe, +describe, +describe,SparkDataFrame,ANY-method, +describe,SparkDataFrame,character-method, +describe,SparkDataFrame-method, +summary, summary, +summary,SparkDataFrame-method; +dim, +dim,SparkDataFrame-method; +distinct, distinct, +distinct,SparkDataFrame-method, +unique, +unique,SparkDataFrame-method; +dropDuplicates, +dropDuplicates, +dropDuplicates,SparkDataFrame-method; +dropna, dropna, +dropna,SparkDataFrame-method, +fillna, fillna, +fillna,SparkDataFrame-method, +na.omit, na.omit, +na.omit,SparkDataFrame-method; +drop, drop, +drop, drop,ANY-method, +drop,SparkDataFrame-method; +dtypes, dtypes, +dtypes,SparkDataFrame-method; +except, except, +except,SparkDataFrame,SparkDataFrame-method; +explain, explain, +explain,SparkDataFrame-method; +filter, filter, +filter,SparkDataFrame,characterOrColumn-method, +where, where, +where,SparkDataFrame,characterOrColumn-method; +first, first, +first, +first,SparkDataFrame-method, +first,characterOrColumn-method; +gapplyCollect, gapplyCollect, +gapplyCollect, +gapplyCollect,GroupedData-method, +gapplyCollect,SparkDataFrame-method; +gapply, gapply, +gapply, +gapply,GroupedData-method, +gapply,SparkDataFrame-method; +groupBy, groupBy, +groupBy,SparkDataFrame-method, +group_by, group_by, +group_by,SparkDataFrame-method; +head, +head,SparkDataFrame-method; +histogram, +histogram,SparkDataFrame,characterOrColumn-method; +insertInto, insertInto, +insertInto,SparkDataFrame,character-method; +intersect, intersect, +intersect,SparkDataFrame,SparkDataFrame-method; +isLocal, isLocal, +isLocal,SparkDataFrame-method; +join, +join,SparkDataFrame,SparkDataFrame-method; +limit, limit, +limit,SparkDataFrame,numeric-method; +merge, merge, +merge,SparkDataFrame,SparkDataFrame-method; +mutate, mutate, +mutate,SparkDataFrame-method, +transform, transform, +transform,SparkDataFrame-method; +ncol, +ncol,SparkDataFrame-method; +persist, persist, +persist,SparkDataFrame,character-method; +printSchema, printSchema, +printSchema,SparkDataFrame-method; +randomSplit, randomSplit,
[20/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/allclasses-noframe.html -- diff --git a/site/docs/2.0.2/api/java/allclasses-noframe.html b/site/docs/2.0.2/api/java/allclasses-noframe.html new file mode 100644 index 000..05ef78b --- /dev/null +++ b/site/docs/2.0.2/api/java/allclasses-noframe.html @@ -0,0 +1,1119 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +All Classes (Spark 2.0.2 JavaDoc) + + + + +All Classes + + +AbsoluteError +Accumulable +AccumulableInfo +AccumulableInfo +AccumulableParam +Accumulator +AccumulatorContext +AccumulatorParam +AccumulatorParam.DoubleAccumulatorParam$ +AccumulatorParam.FloatAccumulatorParam$ +AccumulatorParam.IntAccumulatorParam$ +AccumulatorParam.LongAccumulatorParam$ +AccumulatorParam.StringAccumulatorParam$ +AccumulatorV2 +AFTAggregator +AFTCostFun +AFTSurvivalRegression +AFTSurvivalRegressionModel +AggregatedDialect +AggregatingEdgeContext +Aggregator +Aggregator +Algo +AllJobsCancelled +AllReceiverIds +ALS +ALS +ALS.InBlock$ +ALS.Rating +ALS.Rating$ +ALS.RatingBlock$ +ALSModel +AnalysisException +And +AnyDataType +ApplicationAttemptInfo +ApplicationInfo +ApplicationsListResource +ApplicationStatus +ApplyInPlace +AreaUnderCurve +ArrayType +AskPermissionToCommitOutput +AssociationRules +AssociationRules.Rule +AsyncRDDActions +Attribute +AttributeGroup +AttributeKeys +AttributeType +BaseRelation +BaseRRDD +BatchInfo +BernoulliCellSampler +BernoulliSampler +Binarizer +BinaryAttribute +BinaryClassificationEvaluator +BinaryClassificationMetrics +BinaryLogisticRegressionSummary +BinaryLogisticRegressionTrainingSummary +BinarySample +BinaryType +BinomialBounds +BisectingKMeans +BisectingKMeans +BisectingKMeansModel +BisectingKMeansModel +BisectingKMeansModel.SaveLoadV1_0$ +BLAS +BLAS +BlockId +BlockManagerId +BlockManagerMessages +BlockManagerMessages.BlockManagerHeartbeat +BlockManagerMessages.BlockManagerHeartbeat$ +BlockManagerMessages.GetBlockStatus +BlockManagerMessages.GetBlockStatus$ +BlockManagerMessages.GetExecutorEndpointRef +BlockManagerMessages.GetExecutorEndpointRef$ +BlockManagerMessages.GetLocations +BlockManagerMessages.GetLocations$ +BlockManagerMessages.GetLocationsMultipleBlockIds +BlockManagerMessages.GetLocationsMultipleBlockIds$ +BlockManagerMessages.GetMatchingBlockIds +BlockManagerMessages.GetMatchingBlockIds$ +BlockManagerMessages.GetMemoryStatus$ +BlockManagerMessages.GetPeers +BlockManagerMessages.GetPeers$ +BlockManagerMessages.GetStorageStatus$ +BlockManagerMessages.HasCachedBlocks +BlockManagerMessages.HasCachedBlocks$ +BlockManagerMessages.RegisterBlockManager +BlockManagerMessages.RegisterBlockManager$ +BlockManagerMessages.RemoveBlock +BlockManagerMessages.RemoveBlock$ +BlockManagerMessages.RemoveBroadcast +BlockManagerMessages.RemoveBroadcast$ +BlockManagerMessages.RemoveExecutor +BlockManagerMessages.RemoveExecutor$ +BlockManagerMessages.RemoveRdd +BlockManagerMessages.RemoveRdd$ +BlockManagerMessages.RemoveShuffle +BlockManagerMessages.RemoveShuffle$ +BlockManagerMessages.StopBlockManagerMaster$ +BlockManagerMessages.ToBlockManagerMaster +BlockManagerMessages.ToBlockManagerSlave +BlockManagerMessages.TriggerThreadDump$ +BlockManagerMessages.UpdateBlockInfo +BlockManagerMessages.UpdateBlockInfo$ +BlockMatrix +BlockNotFoundException +BlockStatus +BlockUpdatedInfo +BloomFilter +BloomFilter.Version +BooleanParam +BooleanType +BoostingStrategy +BoundedDouble +BreezeUtil +Broadcast +BroadcastBlockId +Broker +Bucketizer +BufferReleasingInputStream +BytecodeUtils +ByteType +CalendarIntervalType +Catalog +CatalogImpl +CatalystScan +CategoricalSplit +CausedBy +CheckpointReader +CheckpointState +ChiSqSelector +ChiSqSelector +ChiSqSelectorModel +ChiSqSelectorModel +ChiSqSelectorModel.SaveLoadV1_0$ +ChiSqTest +ChiSqTest.Method +ChiSqTest.Method$ +ChiSqTest.NullHypothesis$ +ChiSqTestResult +CholeskyDecomposition +ChunkedByteBufferInputStream +ClassificationModel +ClassificationModel +Classifier +CleanAccum +CleanBroadcast +CleanCheckpoint +CleanRDD +CleanShuffle +CleanupTask +CleanupTaskWeakReference +ClosureCleaner +CoarseGrainedClusterMessages +CoarseGrainedClusterMessages.AddWebUIFilter +CoarseGrainedClusterMessages.AddWebUIFilter$ +CoarseGrainedClusterMessages.GetExecutorLossReason +CoarseGrainedClusterMessages.GetExecutorLossReason$ +CoarseGrainedClusterMessages.KillExecutors +CoarseGrainedClusterMessages.KillExecutors$ +CoarseGrainedClusterMessages.KillTask +CoarseGrainedClusterMessages.KillTask$ +CoarseGrainedClusterMessages.LaunchTask +CoarseGrainedClusterMessages.LaunchTask$ +CoarseGrainedClusterMessages.RegisterClusterManager +CoarseGrainedClusterMessages.RegisterClusterManager$ +CoarseGrainedClusterMessages.RegisteredExecutor$ +CoarseGrainedClusterMessages.RegisterExecutor +CoarseGrainedClusterMessages.RegisterExecutor$ +CoarseGrainedClusterMessages.RegisterExecutorFailed
[28/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/spark.lapply.html -- diff --git a/site/docs/2.0.2/api/R/spark.lapply.html b/site/docs/2.0.2/api/R/spark.lapply.html new file mode 100644 index 000..f337327 --- /dev/null +++ b/site/docs/2.0.2/api/R/spark.lapply.html @@ -0,0 +1,96 @@ + +R: Run a function over a list of elements, distributing the... + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +spark.lapply {SparkR}R Documentation + +Run a function over a list of elements, distributing the computations with Spark + +Description + +Run a function over a list of elements, distributing the computations with Spark. Applies a +function in a manner that is similar to doParallel or lapply to elements of a list. +The computations are distributed using Spark. It is conceptually the same as the following code: +lapply(list, func) + + + +Usage + + +spark.lapply(list, func) + + + +Arguments + + +list + +the list of elements + +func + +a function that takes one argument. + + + + +Details + +Known limitations: + + + + variable scoping and capture: compared to R's rich support for variable resolutions, +the distributed nature of SparkR limits how variables are resolved at runtime. All the +variables that are available through lexical scoping are embedded in the closure of the +function and available as read-only variables within the function. The environment variables +should be stored into temporary variables outside the function, and not directly accessed +within the function. + + + loading external packages: In order to use a package, you need to load it inside the +closure. For example, if you rely on the MASS module, here is how you would use it: + + +train - function(hyperparam) { + library(MASS) + lm.ridge("y ~ x+z", data, lambda=hyperparam) + model +} + + + + + +Value + +a list of results (the exact type being determined by the function) + + + +Note + +spark.lapply since 2.0.0 + + + +Examples + +## Not run: +##D sparkR.session() +##D doubled - spark.lapply(1:10, function(x){2 * x}) +## End(Not run) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/spark.naiveBayes.html -- diff --git a/site/docs/2.0.2/api/R/spark.naiveBayes.html b/site/docs/2.0.2/api/R/spark.naiveBayes.html new file mode 100644 index 000..b4d60c2 --- /dev/null +++ b/site/docs/2.0.2/api/R/spark.naiveBayes.html @@ -0,0 +1,143 @@ + +R: Naive Bayes Models + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +spark.naiveBayes {SparkR}R Documentation + +Naive Bayes Models + +Description + +spark.naiveBayes fits a Bernoulli naive Bayes model against a SparkDataFrame. +Users can call summary to print a summary of the fitted model, predict to make +predictions on new data, and write.ml/read.ml to save/load fitted models. +Only categorical data is supported. + + + +Usage + + +spark.naiveBayes(data, formula, ...) + +## S4 method for signature 'NaiveBayesModel' +predict(object, newData) + +## S4 method for signature 'NaiveBayesModel' +summary(object, ...) + +## S4 method for signature 'SparkDataFrame,formula' +spark.naiveBayes(data, formula, + smoothing = 1, ...) + +## S4 method for signature 'NaiveBayesModel,character' +write.ml(object, path, + overwrite = FALSE) + + + +Arguments + + +data + +a SparkDataFrame of observations and labels for model fitting. + +formula + +a symbolic description of the model to be fitted. Currently only a few formula +operators are supported, including '~', '.', ':', '+', and '-'. + +... + +additional argument(s) passed to the method. Currently only smoothing. + +object + +a naive Bayes model fitted by spark.naiveBayes. + +newData + +a SparkDataFrame for testing. + +smoothing + +smoothing parameter. + +path + +the directory where the model is saved + +overwrite + +overwrites or not if the output path already exists. Default is FALSE +which means throw exception if the output path exists. + + + + +Value + +predict returns a SparkDataFrame containing predicted labeled in a column named +prediction + +summary returns a list containing apriori, the label distribution, and +tables, conditional probabilities given the target label. + +spark.naiveBayes returns a fitted naive Bayes model. + + + +Note + +predict(NaiveBayesModel) since 2.0.0 +
[49/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/00frame_toc.html -- diff --git a/site/docs/2.0.2/api/R/00frame_toc.html b/site/docs/2.0.2/api/R/00frame_toc.html new file mode 100644 index 000..c35b36d --- /dev/null +++ b/site/docs/2.0.2/api/R/00frame_toc.html @@ -0,0 +1,378 @@ + + + + + +R Documentation of SparkR + + +window.onload = function() { + var imgs = document.getElementsByTagName('img'), i, img; + for (i = 0; i < imgs.length; i++) { +img = imgs[i]; +// center an image if it is the only element of its parent +if (img.parentElement.childElementCount === 1) + img.parentElement.style.textAlign = 'center'; + } +}; + + + + + + + +* { + font-family: "Trebuchet MS", "Lucida Grande", "Lucida Sans Unicode", "Lucida Sans", Arial, sans-serif; + font-size: 14px; +} +body { + padding: 0 5px; + margin: 0 auto; + width: 80%; + max-width: 60em; /* 960px */ +} + +h1, h2, h3, h4, h5, h6 { + color: #666; +} +h1, h2 { + text-align: center; +} +h1 { + font-size: x-large; +} +h2, h3 { + font-size: large; +} +h4, h6 { + font-style: italic; +} +h3 { + border-left: solid 5px #ddd; + padding-left: 5px; + font-variant: small-caps; +} + +p img { + display: block; + margin: auto; +} + +span, code, pre { + font-family: Monaco, "Lucida Console", "Courier New", Courier, monospace; +} +span.acronym {} +span.env { + font-style: italic; +} +span.file {} +span.option {} +span.pkg { + font-weight: bold; +} +span.samp{} + +dt, p code { + background-color: #F7F7F7; +} + + + + + + + + +SparkR + + +AFTSurvivalRegressionModel-class +GeneralizedLinearRegressionModel-class +GroupedData +KMeansModel-class +NaiveBayesModel-class +SparkDataFrame +WindowSpec +abs +acos +add_months +alias +approxCountDistinct +approxQuantile +arrange +array_contains +as.data.frame +ascii +asin +atan +atan2 +attach +avg +base64 +between +bin +bitwiseNOT +bround +cache +cacheTable +cancelJobGroup +cast +cbrt +ceil +clearCache +clearJobGroup +collect +coltypes +column +columnfunctions +columns +concat +concat_ws +conv +corr +cos +cosh +count +countDistinct +cov +covar_pop +crc32 +createDataFrame +createExternalTable +createOrReplaceTempView +crosstab +cume_dist +dapply +dapplyCollect +date_add +date_format +date_sub +datediff +dayofmonth +dayofyear +decode +dense_rank +dim +distinct +drop +dropDuplicates +dropTempTable-deprecated +dropTempView +dtypes +encode +endsWith +except +exp +explain +explode +expm1 +expr +factorial +filter +first +fitted +floor +format_number +format_string +freqItems +from_unixtime +fromutctimestamp +gapply +gapplyCollect +generateAliasesForIntersectedCols +glm +greatest +groupBy +hash +hashCode +head +hex +histogram +hour +hypot +ifelse +initcap +insertInto +install.spark +instr +intersect +is.nan +isLocal +join +kurtosis +lag +last +last_day +lead +least +length +levenshtein +limit +lit +locate +log +log10 +log1p +log2 +lower +lpad +ltrim +match +max +md5 +mean +merge +min +minute +monotonicallyincreasingid +month +months_between +mutate +nafunctions +nanvl +ncol +negate +next_day +nrow +ntile +orderBy +otherwise +over +partitionBy +percent_rank +persist +pivot +pmod +posexplode +predict +print.jobj +print.structField +print.structType +printSchema +quarter +rand +randn +randomSplit +rangeBetween +rank +rbind +read.df +read.jdbc +read.json +read.ml +read.orc +read.parquet +read.text +regexp_extract +regexp_replace +registerTempTable-deprecated +rename +repartition +reverse +rint +round +row_number +rowsBetween +rpad +rtrim +sample +sampleBy +saveAsTable +schema +sd +second +select +selectExpr +setJobGroup +setLogLevel +sha1 +sha2 +shiftLeft +shiftRight +shiftRightUnsigned +show +showDF +sign +sin +sinh +size +skewness +sort_array +soundex +spark.glm +spark.kmeans +spark.lapply +spark.naiveBayes +spark.survreg +sparkR.callJMethod +sparkR.callJStatic +sparkR.conf +sparkR.init-deprecated +sparkR.newJObject +sparkR.session +sparkR.session.stop +sparkR.version +sparkRHive.init-deprecated +sparkRSQL.init-deprecated +sparkpartitionid +sql +sqrt +startsWith +stddev_pop +stddev_samp +str +struct +structField +structType +subset +substr +substring_index +sum +sumDistinct +summarize +summary +tableNames +tableToDF +tables +take +tan +tanh +toDegrees +toRadians +to_date +toutctimestamp +translate +trim +unbase64 +uncacheTable +unhex +union +unix_timestamp +unpersist-methods +upper +var +var_pop +var_samp +weekofyear +when +window +windowOrderBy +windowPartitionBy +with +withColumn +write.df +write.jdbc +write.json +write.ml +write.orc +write.parquet +write.text +year + + +Generated with http://yihui.name/knitr;>knitr 1.14 + + + + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/AFTSurvivalRegressionModel-class.html
[34/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/nrow.html -- diff --git a/site/docs/2.0.2/api/R/nrow.html b/site/docs/2.0.2/api/R/nrow.html new file mode 100644 index 000..2626e03 --- /dev/null +++ b/site/docs/2.0.2/api/R/nrow.html @@ -0,0 +1,260 @@ + +R: Returns the number of rows in a SparkDataFrame + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +nrow {SparkR}R Documentation + +Returns the number of rows in a SparkDataFrame + +Description + +Returns the number of rows in a SparkDataFrame + + + +Usage + + +## S4 method for signature 'SparkDataFrame' +count(x) + +## S4 method for signature 'SparkDataFrame' +nrow(x) + + + +Arguments + + +x + +a SparkDataFrame. + + + + +Note + +count since 1.4.0 + +nrow since 1.5.0 + + + +See Also + +Other SparkDataFrame functions: $, +$,SparkDataFrame-method, $-, +$-,SparkDataFrame-method, +select, select, +select,SparkDataFrame,Column-method, +select,SparkDataFrame,character-method, +select,SparkDataFrame,list-method; +SparkDataFrame-class; [, +[,SparkDataFrame-method, [[, +[[,SparkDataFrame,numericOrcharacter-method, +subset, subset, +subset,SparkDataFrame-method; +agg, agg, agg, +agg,GroupedData-method, +agg,SparkDataFrame-method, +summarize, summarize, +summarize, +summarize,GroupedData-method, +summarize,SparkDataFrame-method; +arrange, arrange, +arrange, +arrange,SparkDataFrame,Column-method, +arrange,SparkDataFrame,character-method, +orderBy,SparkDataFrame,characterOrColumn-method; +as.data.frame, +as.data.frame,SparkDataFrame-method; +attach, +attach,SparkDataFrame-method; +cache, cache, +cache,SparkDataFrame-method; +collect, collect, +collect,SparkDataFrame-method; +colnames, colnames, +colnames,SparkDataFrame-method, +colnames-, colnames-, +colnames-,SparkDataFrame-method, +columns, columns, +columns,SparkDataFrame-method, +names, +names,SparkDataFrame-method, +names-, +names-,SparkDataFrame-method; +coltypes, coltypes, +coltypes,SparkDataFrame-method, +coltypes-, coltypes-, +coltypes-,SparkDataFrame,character-method; +createOrReplaceTempView, +createOrReplaceTempView, +createOrReplaceTempView,SparkDataFrame,character-method; +dapplyCollect, dapplyCollect, +dapplyCollect,SparkDataFrame,function-method; +dapply, dapply, +dapply,SparkDataFrame,function,structType-method; +describe, describe, +describe, +describe,SparkDataFrame,ANY-method, +describe,SparkDataFrame,character-method, +describe,SparkDataFrame-method, +summary, summary, +summary,SparkDataFrame-method; +dim, +dim,SparkDataFrame-method; +distinct, distinct, +distinct,SparkDataFrame-method, +unique, +unique,SparkDataFrame-method; +dropDuplicates, +dropDuplicates, +dropDuplicates,SparkDataFrame-method; +dropna, dropna, +dropna,SparkDataFrame-method, +fillna, fillna, +fillna,SparkDataFrame-method, +na.omit, na.omit, +na.omit,SparkDataFrame-method; +drop, drop, +drop, drop,ANY-method, +drop,SparkDataFrame-method; +dtypes, dtypes, +dtypes,SparkDataFrame-method; +except, except, +except,SparkDataFrame,SparkDataFrame-method; +explain, explain, +explain,SparkDataFrame-method; +filter, filter, +filter,SparkDataFrame,characterOrColumn-method, +where, where, +where,SparkDataFrame,characterOrColumn-method; +first, first, +first, +first,SparkDataFrame-method, +first,characterOrColumn-method; +gapplyCollect, gapplyCollect, +gapplyCollect, +gapplyCollect,GroupedData-method, +gapplyCollect,SparkDataFrame-method; +gapply, gapply, +gapply, +gapply,GroupedData-method, +gapply,SparkDataFrame-method; +groupBy, groupBy, +groupBy,SparkDataFrame-method, +group_by, group_by, +group_by,SparkDataFrame-method; +head, +head,SparkDataFrame-method; +histogram, +histogram,SparkDataFrame,characterOrColumn-method; +insertInto, insertInto, +insertInto,SparkDataFrame,character-method; +intersect, intersect, +intersect,SparkDataFrame,SparkDataFrame-method; +isLocal, isLocal, +isLocal,SparkDataFrame-method; +join, +join,SparkDataFrame,SparkDataFrame-method; +limit, limit, +limit,SparkDataFrame,numeric-method; +merge, merge, +merge,SparkDataFrame,SparkDataFrame-method; +mutate, mutate, +mutate,SparkDataFrame-method, +transform, transform, +transform,SparkDataFrame-method; +ncol, +ncol,SparkDataFrame-method; +persist, persist, +persist,SparkDataFrame,character-method; +printSchema, printSchema, +printSchema,SparkDataFrame-method; +randomSplit, randomSplit, +randomSplit,SparkDataFrame,numeric-method; +rbind, rbind, +rbind,SparkDataFrame-method; +registerTempTable, +registerTempTable, +registerTempTable,SparkDataFrame,character-method; +rename, rename, +rename,SparkDataFrame-method, +withColumnRenamed, +withColumnRenamed, +withColumnRenamed,SparkDataFrame,character,character-method;
[48/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/arrange.html -- diff --git a/site/docs/2.0.2/api/R/arrange.html b/site/docs/2.0.2/api/R/arrange.html new file mode 100644 index 000..e5ac48a --- /dev/null +++ b/site/docs/2.0.2/api/R/arrange.html @@ -0,0 +1,287 @@ + +R: Arrange Rows by Variables + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +arrange {SparkR}R Documentation + +Arrange Rows by Variables + +Description + +Sort a SparkDataFrame by the specified column(s). + + + +Usage + + +## S4 method for signature 'SparkDataFrame,Column' +arrange(x, col, ...) + +## S4 method for signature 'SparkDataFrame,character' +arrange(x, col, ..., decreasing = FALSE) + +## S4 method for signature 'SparkDataFrame,characterOrColumn' +orderBy(x, col, ...) + +arrange(x, col, ...) + + + +Arguments + + +x + +a SparkDataFrame to be sorted. + +col + +a character or Column object indicating the fields to sort on + +... + +additional sorting fields + +decreasing + +a logical argument indicating sorting order for columns when +a character vector is specified for col + + + + +Value + +A SparkDataFrame where all elements are sorted. + + + +Note + +arrange(SparkDataFrame, Column) since 1.4.0 + +arrange(SparkDataFrame, character) since 1.4.0 + +orderBy(SparkDataFrame, characterOrColumn) since 1.4.0 + + + +See Also + +Other SparkDataFrame functions: $, +$,SparkDataFrame-method, $-, +$-,SparkDataFrame-method, +select, select, +select,SparkDataFrame,Column-method, +select,SparkDataFrame,character-method, +select,SparkDataFrame,list-method; +SparkDataFrame-class; [, +[,SparkDataFrame-method, [[, +[[,SparkDataFrame,numericOrcharacter-method, +subset, subset, +subset,SparkDataFrame-method; +agg, agg, agg, +agg,GroupedData-method, +agg,SparkDataFrame-method, +summarize, summarize, +summarize, +summarize,GroupedData-method, +summarize,SparkDataFrame-method; +as.data.frame, +as.data.frame,SparkDataFrame-method; +attach, +attach,SparkDataFrame-method; +cache, cache, +cache,SparkDataFrame-method; +collect, collect, +collect,SparkDataFrame-method; +colnames, colnames, +colnames,SparkDataFrame-method, +colnames-, colnames-, +colnames-,SparkDataFrame-method, +columns, columns, +columns,SparkDataFrame-method, +names, +names,SparkDataFrame-method, +names-, +names-,SparkDataFrame-method; +coltypes, coltypes, +coltypes,SparkDataFrame-method, +coltypes-, coltypes-, +coltypes-,SparkDataFrame,character-method; +count,SparkDataFrame-method, +nrow, nrow, +nrow,SparkDataFrame-method; +createOrReplaceTempView, +createOrReplaceTempView, +createOrReplaceTempView,SparkDataFrame,character-method; +dapplyCollect, dapplyCollect, +dapplyCollect,SparkDataFrame,function-method; +dapply, dapply, +dapply,SparkDataFrame,function,structType-method; +describe, describe, +describe, +describe,SparkDataFrame,ANY-method, +describe,SparkDataFrame,character-method, +describe,SparkDataFrame-method, +summary, summary, +summary,SparkDataFrame-method; +dim, +dim,SparkDataFrame-method; +distinct, distinct, +distinct,SparkDataFrame-method, +unique, +unique,SparkDataFrame-method; +dropDuplicates, +dropDuplicates, +dropDuplicates,SparkDataFrame-method; +dropna, dropna, +dropna,SparkDataFrame-method, +fillna, fillna, +fillna,SparkDataFrame-method, +na.omit, na.omit, +na.omit,SparkDataFrame-method; +drop, drop, +drop, drop,ANY-method, +drop,SparkDataFrame-method; +dtypes, dtypes, +dtypes,SparkDataFrame-method; +except, except, +except,SparkDataFrame,SparkDataFrame-method; +explain, explain, +explain,SparkDataFrame-method; +filter, filter, +filter,SparkDataFrame,characterOrColumn-method, +where, where, +where,SparkDataFrame,characterOrColumn-method; +first, first, +first, +first,SparkDataFrame-method, +first,characterOrColumn-method; +gapplyCollect, gapplyCollect, +gapplyCollect, +gapplyCollect,GroupedData-method, +gapplyCollect,SparkDataFrame-method; +gapply, gapply, +gapply, +gapply,GroupedData-method, +gapply,SparkDataFrame-method; +groupBy, groupBy, +groupBy,SparkDataFrame-method, +group_by, group_by, +group_by,SparkDataFrame-method; +head, +head,SparkDataFrame-method; +histogram, +histogram,SparkDataFrame,characterOrColumn-method; +insertInto, insertInto, +insertInto,SparkDataFrame,character-method; +intersect, intersect, +intersect,SparkDataFrame,SparkDataFrame-method; +isLocal, isLocal, +isLocal,SparkDataFrame-method; +join, +join,SparkDataFrame,SparkDataFrame-method; +limit, limit, +limit,SparkDataFrame,numeric-method; +merge, merge, +merge,SparkDataFrame,SparkDataFrame-method; +mutate, mutate, +mutate,SparkDataFrame-method, +transform, transform, +transform,SparkDataFrame-method; +ncol, +ncol,SparkDataFrame-method; +persist,
[40/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/gapply.html -- diff --git a/site/docs/2.0.2/api/R/gapply.html b/site/docs/2.0.2/api/R/gapply.html new file mode 100644 index 000..03d3587 --- /dev/null +++ b/site/docs/2.0.2/api/R/gapply.html @@ -0,0 +1,348 @@ + +R: gapply + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +gapply {SparkR}R Documentation + +gapply + +Description + +Groups the SparkDataFrame using the specified columns and applies the R function to each +group. + +gapply + + + +Usage + + +## S4 method for signature 'SparkDataFrame' +gapply(x, cols, func, schema) + +gapply(x, ...) + +## S4 method for signature 'GroupedData' +gapply(x, func, schema) + + + +Arguments + + +x + +a SparkDataFrame or GroupedData. + +cols + +grouping columns. + +func + +a function to be applied to each group partition specified by grouping +column of the SparkDataFrame. The function func takes as argument +a key - grouping columns and a data frame - a local R data.frame. +The output of func is a local R data.frame. + +schema + +the schema of the resulting SparkDataFrame after the function is applied. +The schema must match to output of func. It has to be defined for each +output column with preferred output column name and corresponding data type. + +... + +additional argument(s) passed to the method. + + + + +Value + +A SparkDataFrame. + + + +Note + +gapply(SparkDataFrame) since 2.0.0 + +gapply(GroupedData) since 2.0.0 + + + +See Also + +gapplyCollect + +Other SparkDataFrame functions: $, +$,SparkDataFrame-method, $-, +$-,SparkDataFrame-method, +select, select, +select,SparkDataFrame,Column-method, +select,SparkDataFrame,character-method, +select,SparkDataFrame,list-method; +SparkDataFrame-class; [, +[,SparkDataFrame-method, [[, +[[,SparkDataFrame,numericOrcharacter-method, +subset, subset, +subset,SparkDataFrame-method; +agg, agg, agg, +agg,GroupedData-method, +agg,SparkDataFrame-method, +summarize, summarize, +summarize, +summarize,GroupedData-method, +summarize,SparkDataFrame-method; +arrange, arrange, +arrange, +arrange,SparkDataFrame,Column-method, +arrange,SparkDataFrame,character-method, +orderBy,SparkDataFrame,characterOrColumn-method; +as.data.frame, +as.data.frame,SparkDataFrame-method; +attach, +attach,SparkDataFrame-method; +cache, cache, +cache,SparkDataFrame-method; +collect, collect, +collect,SparkDataFrame-method; +colnames, colnames, +colnames,SparkDataFrame-method, +colnames-, colnames-, +colnames-,SparkDataFrame-method, +columns, columns, +columns,SparkDataFrame-method, +names, +names,SparkDataFrame-method, +names-, +names-,SparkDataFrame-method; +coltypes, coltypes, +coltypes,SparkDataFrame-method, +coltypes-, coltypes-, +coltypes-,SparkDataFrame,character-method; +count,SparkDataFrame-method, +nrow, nrow, +nrow,SparkDataFrame-method; +createOrReplaceTempView, +createOrReplaceTempView, +createOrReplaceTempView,SparkDataFrame,character-method; +dapplyCollect, dapplyCollect, +dapplyCollect,SparkDataFrame,function-method; +dapply, dapply, +dapply,SparkDataFrame,function,structType-method; +describe, describe, +describe, +describe,SparkDataFrame,ANY-method, +describe,SparkDataFrame,character-method, +describe,SparkDataFrame-method, +summary, summary, +summary,SparkDataFrame-method; +dim, +dim,SparkDataFrame-method; +distinct, distinct, +distinct,SparkDataFrame-method, +unique, +unique,SparkDataFrame-method; +dropDuplicates, +dropDuplicates, +dropDuplicates,SparkDataFrame-method; +dropna, dropna, +dropna,SparkDataFrame-method, +fillna, fillna, +fillna,SparkDataFrame-method, +na.omit, na.omit, +na.omit,SparkDataFrame-method; +drop, drop, +drop, drop,ANY-method, +drop,SparkDataFrame-method; +dtypes, dtypes, +dtypes,SparkDataFrame-method; +except, except, +except,SparkDataFrame,SparkDataFrame-method; +explain, explain, +explain,SparkDataFrame-method; +filter, filter, +filter,SparkDataFrame,characterOrColumn-method, +where, where, +where,SparkDataFrame,characterOrColumn-method; +first, first, +first, +first,SparkDataFrame-method, +first,characterOrColumn-method; +gapplyCollect, gapplyCollect, +gapplyCollect, +gapplyCollect,GroupedData-method, +gapplyCollect,SparkDataFrame-method; +groupBy, groupBy, +groupBy,SparkDataFrame-method, +group_by, group_by, +group_by,SparkDataFrame-method; +head, +head,SparkDataFrame-method; +histogram, +histogram,SparkDataFrame,characterOrColumn-method; +insertInto, insertInto, +insertInto,SparkDataFrame,character-method; +intersect, intersect, +intersect,SparkDataFrame,SparkDataFrame-method; +isLocal, isLocal, +isLocal,SparkDataFrame-method; +join, +join,SparkDataFrame,SparkDataFrame-method; +limit, limit,
[32/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/read.orc.html -- diff --git a/site/docs/2.0.2/api/R/read.orc.html b/site/docs/2.0.2/api/R/read.orc.html new file mode 100644 index 000..e67fa6a --- /dev/null +++ b/site/docs/2.0.2/api/R/read.orc.html @@ -0,0 +1,46 @@ + +R: Create a SparkDataFrame from an ORC file. + + + + +read.orc {SparkR}R Documentation + +Create a SparkDataFrame from an ORC file. + +Description + +Loads an ORC file, returning the result as a SparkDataFrame. + + + +Usage + + +read.orc(path) + + + +Arguments + + +path + +Path of file to read. + + + + +Value + +SparkDataFrame + + + +Note + +read.orc since 2.0.0 + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/read.parquet.html -- diff --git a/site/docs/2.0.2/api/R/read.parquet.html b/site/docs/2.0.2/api/R/read.parquet.html new file mode 100644 index 000..0d42bcd --- /dev/null +++ b/site/docs/2.0.2/api/R/read.parquet.html @@ -0,0 +1,56 @@ + +R: Create a SparkDataFrame from a Parquet file. + + + + +read.parquet {SparkR}R Documentation + +Create a SparkDataFrame from a Parquet file. + +Description + +Loads a Parquet file, returning the result as a SparkDataFrame. + + + +Usage + + +## Default S3 method: +read.parquet(path) + +## Default S3 method: +parquetFile(...) + + + +Arguments + + +path + +path of file to read. A vector of multiple paths is allowed. + +... + +argument(s) passed to the method. + + + + +Value + +SparkDataFrame + + + +Note + +read.parquet since 1.6.0 + +parquetFile since 1.4.0 + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/read.text.html -- diff --git a/site/docs/2.0.2/api/R/read.text.html b/site/docs/2.0.2/api/R/read.text.html new file mode 100644 index 000..2b6d8ca --- /dev/null +++ b/site/docs/2.0.2/api/R/read.text.html @@ -0,0 +1,71 @@ + +R: Create a SparkDataFrame from a text file. + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +read.text {SparkR}R Documentation + +Create a SparkDataFrame from a text file. + +Description + +Loads text files and returns a SparkDataFrame whose schema starts with +a string column named value, and followed by partitioned columns if +there are any. + + + +Usage + + +## Default S3 method: +read.text(path) + + + +Arguments + + +path + +Path of file to read. A vector of multiple paths is allowed. + + + + +Details + +Each line in the text file is a new row in the resulting SparkDataFrame. + + + +Value + +SparkDataFrame + + + +Note + +read.text since 1.6.1 + + + +Examples + +## Not run: +##D sparkR.session() +##D path - path/to/file.txt +##D df - read.text(path) +## End(Not run) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/regexp_extract.html -- diff --git a/site/docs/2.0.2/api/R/regexp_extract.html b/site/docs/2.0.2/api/R/regexp_extract.html new file mode 100644 index 000..375ceb0 --- /dev/null +++ b/site/docs/2.0.2/api/R/regexp_extract.html @@ -0,0 +1,122 @@ + +R: regexp_extract + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +regexp_extract {SparkR}R Documentation + +regexp_extract + +Description + +Extract a specific idx group identified by a Java regex, from the specified string column. +If the regex did not match, or the specified group did not match, an empty string is returned. + + + +Usage + + +## S4 method for signature 'Column,character,numeric' +regexp_extract(x, pattern, idx) + +regexp_extract(x, pattern, idx) + + + +Arguments + + +x + +a string Column. + +pattern + +a regular expression. + +idx + +a group index. + + + + +Note + +regexp_extract since 1.5.0 + + + +See Also + +Other string_funcs: ascii, +ascii, ascii,Column-method; +base64, base64, +base64,Column-method; +concat_ws, concat_ws, +concat_ws,character,Column-method; +concat, concat, +concat,Column-method; decode, +decode, +decode,Column,character-method; +encode, encode, +encode,Column,character-method; +format_number, format_number, +format_number,Column,numeric-method; +format_string, format_string, +format_string,character,Column-method;
[17/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/index.html -- diff --git a/site/docs/2.0.2/api/java/index.html b/site/docs/2.0.2/api/java/index.html new file mode 100644 index 000..f0b9c05 --- /dev/null +++ b/site/docs/2.0.2/api/java/index.html @@ -0,0 +1,74 @@ +http://www.w3.org/TR/html4/frameset.dtd;> + + + + +Spark 2.0.2 JavaDoc + +targetPage = "" + window.location.search; +if (targetPage != "" && targetPage != "undefined") +targetPage = targetPage.substring(1); +if (targetPage.indexOf(":") != -1 || (targetPage != "" && !validURL(targetPage))) +targetPage = "undefined"; +function validURL(url) { +try { +url = decodeURIComponent(url); +} +catch (error) { +return false; +} +var pos = url.indexOf(".html"); +if (pos == -1 || pos != url.length - 5) +return false; +var allowNumber = false; +var allowSep = false; +var seenDot = false; +for (var i = 0; i < url.length - 5; i++) { +var ch = url.charAt(i); +if ('a' <= ch && ch <= 'z' || +'A' <= ch && ch <= 'Z' || +ch == '$' || +ch == '_' || +ch.charCodeAt(0) > 127) { +allowNumber = true; +allowSep = true; +} else if ('0' <= ch && ch <= '9' +|| ch == '-') { +if (!allowNumber) + return false; +} else if (ch == '/' || ch == '.') { +if (!allowSep) +return false; +allowNumber = false; +allowSep = false; +if (ch == '.') + seenDot = true; +if (ch == '/' && seenDot) + return false; +} else { +return false; +} +} +return true; +} +function loadFrames() { +if (targetPage != "" && targetPage != "undefined") + top.classFrame.location = top.targetPage; +} + + + + + + + + + + +JavaScript is disabled on your browser. + +Frame Alert +This document is designed to be viewed using the frames feature. If you see this message, you are using a non-frame-capable web client. Link to Non-frame version. + + + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/lib/api-javadocs.js -- diff --git a/site/docs/2.0.2/api/java/lib/api-javadocs.js b/site/docs/2.0.2/api/java/lib/api-javadocs.js new file mode 100644 index 000..ead13d6 --- /dev/null +++ b/site/docs/2.0.2/api/java/lib/api-javadocs.js @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* Dynamically injected post-processing code for the API docs */ + +$(document).ready(function() { + addBadges(":: AlphaComponent ::", 'Alpha Component'); + addBadges(":: DeveloperApi ::", 'Developer API'); + addBadges(":: Experimental ::", 'Experimental'); +}); + +function addBadges(tag, html) { + var tags = $(".block:contains(" + tag + ")") + + // Remove identifier tags + tags.each(function(index) { +var oldHTML = $(this).html(); +var newHTML = oldHTML.replace(tag, ""); +$(this).html(newHTML); + }); + + // Add html badge tags + tags.each(function(index) { +if ($(this).parent().is('td.colLast')) { + $(this).parent().prepend(html); +} else if ($(this).parent('li.blockList') + .parent('ul.blockList') + .parent('div.description') + .parent().is('div.contentContainer')) { + var contentContainer = $(this).parent('li.blockList') +.parent('ul.blockList') +.parent('div.description') +.parent('div.contentContainer') + var header = contentContainer.prev('div.header'); + if (header.length > 0) { +header.prepend(html); + } else { +
[23/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/write.jdbc.html -- diff --git a/site/docs/2.0.2/api/R/write.jdbc.html b/site/docs/2.0.2/api/R/write.jdbc.html new file mode 100644 index 000..d357087 --- /dev/null +++ b/site/docs/2.0.2/api/R/write.jdbc.html @@ -0,0 +1,299 @@ + +R: Save the content of SparkDataFrame to an external database... + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +write.jdbc {SparkR}R Documentation + +Save the content of SparkDataFrame to an external database table via JDBC. + +Description + +Save the content of the SparkDataFrame to an external database table via JDBC. Additional JDBC +database connection properties can be set (...) + + + +Usage + + +## S4 method for signature 'SparkDataFrame,character,character' +write.jdbc(x, url, tableName, + mode = "error", ...) + +write.jdbc(x, url, tableName, mode = "error", ...) + + + +Arguments + + +x + +a SparkDataFrame. + +url + +JDBC database url of the form jdbc:subprotocol:subname. + +tableName + +yhe name of the table in the external database. + +mode + +one of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default). + +... + +additional JDBC database connection properties. + + + + +Details + +Also, mode is used to specify the behavior of the save operation when +data already exists in the data source. There are four modes: + + + + append: Contents of this SparkDataFrame are expected to be appended to existing data. + + + overwrite: Existing data is expected to be overwritten by the contents of this +SparkDataFrame. + + + error: An exception is expected to be thrown. + + + ignore: The save operation is expected to not save the contents of the SparkDataFrame +and to not change the existing data. + + + + + +Note + +write.jdbc since 2.0.0 + + + +See Also + +Other SparkDataFrame functions: $, +$,SparkDataFrame-method, $-, +$-,SparkDataFrame-method, +select, select, +select,SparkDataFrame,Column-method, +select,SparkDataFrame,character-method, +select,SparkDataFrame,list-method; +SparkDataFrame-class; [, +[,SparkDataFrame-method, [[, +[[,SparkDataFrame,numericOrcharacter-method, +subset, subset, +subset,SparkDataFrame-method; +agg, agg, agg, +agg,GroupedData-method, +agg,SparkDataFrame-method, +summarize, summarize, +summarize, +summarize,GroupedData-method, +summarize,SparkDataFrame-method; +arrange, arrange, +arrange, +arrange,SparkDataFrame,Column-method, +arrange,SparkDataFrame,character-method, +orderBy,SparkDataFrame,characterOrColumn-method; +as.data.frame, +as.data.frame,SparkDataFrame-method; +attach, +attach,SparkDataFrame-method; +cache, cache, +cache,SparkDataFrame-method; +collect, collect, +collect,SparkDataFrame-method; +colnames, colnames, +colnames,SparkDataFrame-method, +colnames-, colnames-, +colnames-,SparkDataFrame-method, +columns, columns, +columns,SparkDataFrame-method, +names, +names,SparkDataFrame-method, +names-, +names-,SparkDataFrame-method; +coltypes, coltypes, +coltypes,SparkDataFrame-method, +coltypes-, coltypes-, +coltypes-,SparkDataFrame,character-method; +count,SparkDataFrame-method, +nrow, nrow, +nrow,SparkDataFrame-method; +createOrReplaceTempView, +createOrReplaceTempView, +createOrReplaceTempView,SparkDataFrame,character-method; +dapplyCollect, dapplyCollect, +dapplyCollect,SparkDataFrame,function-method; +dapply, dapply, +dapply,SparkDataFrame,function,structType-method; +describe, describe, +describe, +describe,SparkDataFrame,ANY-method, +describe,SparkDataFrame,character-method, +describe,SparkDataFrame-method, +summary, summary, +summary,SparkDataFrame-method; +dim, +dim,SparkDataFrame-method; +distinct, distinct, +distinct,SparkDataFrame-method, +unique, +unique,SparkDataFrame-method; +dropDuplicates, +dropDuplicates, +dropDuplicates,SparkDataFrame-method; +dropna, dropna, +dropna,SparkDataFrame-method, +fillna, fillna, +fillna,SparkDataFrame-method, +na.omit, na.omit, +na.omit,SparkDataFrame-method; +drop, drop, +drop, drop,ANY-method, +drop,SparkDataFrame-method; +dtypes, dtypes, +dtypes,SparkDataFrame-method; +except, except, +except,SparkDataFrame,SparkDataFrame-method; +explain, explain, +explain,SparkDataFrame-method; +filter, filter, +filter,SparkDataFrame,characterOrColumn-method, +where, where, +where,SparkDataFrame,characterOrColumn-method; +first, first, +first, +first,SparkDataFrame-method, +first,characterOrColumn-method; +gapplyCollect, gapplyCollect, +gapplyCollect, +gapplyCollect,GroupedData-method, +gapplyCollect,SparkDataFrame-method; +gapply, gapply, +gapply, +gapply,GroupedData-method, +gapply,SparkDataFrame-method; +groupBy, groupBy, +groupBy,SparkDataFrame-method, +group_by, group_by,
[35/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/mutate.html -- diff --git a/site/docs/2.0.2/api/R/mutate.html b/site/docs/2.0.2/api/R/mutate.html new file mode 100644 index 000..76e5ba6 --- /dev/null +++ b/site/docs/2.0.2/api/R/mutate.html @@ -0,0 +1,285 @@ + +R: Mutate + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +mutate {SparkR}R Documentation + +Mutate + +Description + +Return a new SparkDataFrame with the specified columns added or replaced. + + + +Usage + + +## S4 method for signature 'SparkDataFrame' +mutate(.data, ...) + +## S4 method for signature 'SparkDataFrame' +transform(`_data`, ...) + +mutate(.data, ...) + +transform(`_data`, ...) + + + +Arguments + + +.data + +a SparkDataFrame. + +... + +additional column argument(s) each in the form name = col. + +_data + +a SparkDataFrame. + + + + +Value + +A new SparkDataFrame with the new columns added or replaced. + + + +Note + +mutate since 1.4.0 + +transform since 1.5.0 + + + +See Also + +rename withColumn + +Other SparkDataFrame functions: $, +$,SparkDataFrame-method, $-, +$-,SparkDataFrame-method, +select, select, +select,SparkDataFrame,Column-method, +select,SparkDataFrame,character-method, +select,SparkDataFrame,list-method; +SparkDataFrame-class; [, +[,SparkDataFrame-method, [[, +[[,SparkDataFrame,numericOrcharacter-method, +subset, subset, +subset,SparkDataFrame-method; +agg, agg, agg, +agg,GroupedData-method, +agg,SparkDataFrame-method, +summarize, summarize, +summarize, +summarize,GroupedData-method, +summarize,SparkDataFrame-method; +arrange, arrange, +arrange, +arrange,SparkDataFrame,Column-method, +arrange,SparkDataFrame,character-method, +orderBy,SparkDataFrame,characterOrColumn-method; +as.data.frame, +as.data.frame,SparkDataFrame-method; +attach, +attach,SparkDataFrame-method; +cache, cache, +cache,SparkDataFrame-method; +collect, collect, +collect,SparkDataFrame-method; +colnames, colnames, +colnames,SparkDataFrame-method, +colnames-, colnames-, +colnames-,SparkDataFrame-method, +columns, columns, +columns,SparkDataFrame-method, +names, +names,SparkDataFrame-method, +names-, +names-,SparkDataFrame-method; +coltypes, coltypes, +coltypes,SparkDataFrame-method, +coltypes-, coltypes-, +coltypes-,SparkDataFrame,character-method; +count,SparkDataFrame-method, +nrow, nrow, +nrow,SparkDataFrame-method; +createOrReplaceTempView, +createOrReplaceTempView, +createOrReplaceTempView,SparkDataFrame,character-method; +dapplyCollect, dapplyCollect, +dapplyCollect,SparkDataFrame,function-method; +dapply, dapply, +dapply,SparkDataFrame,function,structType-method; +describe, describe, +describe, +describe,SparkDataFrame,ANY-method, +describe,SparkDataFrame,character-method, +describe,SparkDataFrame-method, +summary, summary, +summary,SparkDataFrame-method; +dim, +dim,SparkDataFrame-method; +distinct, distinct, +distinct,SparkDataFrame-method, +unique, +unique,SparkDataFrame-method; +dropDuplicates, +dropDuplicates, +dropDuplicates,SparkDataFrame-method; +dropna, dropna, +dropna,SparkDataFrame-method, +fillna, fillna, +fillna,SparkDataFrame-method, +na.omit, na.omit, +na.omit,SparkDataFrame-method; +drop, drop, +drop, drop,ANY-method, +drop,SparkDataFrame-method; +dtypes, dtypes, +dtypes,SparkDataFrame-method; +except, except, +except,SparkDataFrame,SparkDataFrame-method; +explain, explain, +explain,SparkDataFrame-method; +filter, filter, +filter,SparkDataFrame,characterOrColumn-method, +where, where, +where,SparkDataFrame,characterOrColumn-method; +first, first, +first, +first,SparkDataFrame-method, +first,characterOrColumn-method; +gapplyCollect, gapplyCollect, +gapplyCollect, +gapplyCollect,GroupedData-method, +gapplyCollect,SparkDataFrame-method; +gapply, gapply, +gapply, +gapply,GroupedData-method, +gapply,SparkDataFrame-method; +groupBy, groupBy, +groupBy,SparkDataFrame-method, +group_by, group_by, +group_by,SparkDataFrame-method; +head, +head,SparkDataFrame-method; +histogram, +histogram,SparkDataFrame,characterOrColumn-method; +insertInto, insertInto, +insertInto,SparkDataFrame,character-method; +intersect, intersect, +intersect,SparkDataFrame,SparkDataFrame-method; +isLocal, isLocal, +isLocal,SparkDataFrame-method; +join, +join,SparkDataFrame,SparkDataFrame-method; +limit, limit, +limit,SparkDataFrame,numeric-method; +merge, merge, +merge,SparkDataFrame,SparkDataFrame-method; +ncol, +ncol,SparkDataFrame-method; +persist, persist, +persist,SparkDataFrame,character-method; +printSchema, printSchema, +printSchema,SparkDataFrame-method; +randomSplit, randomSplit, +randomSplit,SparkDataFrame,numeric-method; +rbind, rbind, +rbind,SparkDataFrame-method; +registerTempTable, +registerTempTable,
[08/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/RangePartitioner.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/RangePartitioner.html b/site/docs/2.0.2/api/java/org/apache/spark/RangePartitioner.html new file mode 100644 index 000..21e8fd1 --- /dev/null +++ b/site/docs/2.0.2/api/java/org/apache/spark/RangePartitioner.html @@ -0,0 +1,390 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +RangePartitioner (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + +org.apache.spark +Class RangePartitionerK,V + + + +Object + + +org.apache.spark.Partitioner + + +org.apache.spark.RangePartitionerK,V + + + + + + + + + +All Implemented Interfaces: +java.io.Serializable + + + +public class RangePartitionerK,V +extends Partitioner +A Partitioner that partitions sortable records by range into roughly + equal ranges. The ranges are determined by sampling the content of the RDD passed in. + + Note that the actual number of partitions created by the RangePartitioner might not be the same + as the partitions parameter, in the case where the number of sampled records is less than + the value of partitions. +See Also:Serialized Form + + + + + + + + + + + +Constructor Summary + +Constructors + +Constructor and Description + + +RangePartitioner(intpartitions, +RDD? extends scala.Product2K,Vrdd, +booleanascending, +scala.math.OrderingKevidence$1, +scala.reflect.ClassTagKevidence$2) + + + + + + + + + +Method Summary + +Methods + +Modifier and Type +Method and Description + + +static KObject +determineBounds(scala.collection.mutable.ArrayBufferscala.Tuple2K,Objectcandidates, + intpartitions, + scala.math.OrderingKevidence$4, + scala.reflect.ClassTagKevidence$5) +Determines the bounds for range partitioning from candidates with weights indicating how many + items each represents. + + + +boolean +equals(Objectother) + + +int +getPartition(Objectkey) + + +int +hashCode() + + +int +numPartitions() + + +static Kscala.Tuple2Object,scala.Tuple3Object,Object,Object[] +sketch(RDDKrdd, + intsampleSizePerPartition, + scala.reflect.ClassTagKevidence$3) +Sketches the input RDD via reservoir sampling on each partition. + + + + + + + +Methods inherited from classorg.apache.spark.Partitioner +defaultPartitioner + + + + + +Methods inherited from classObject +getClass, notify, notifyAll, toString, wait, wait, wait + + + + + + + + + + + + + + +Constructor Detail + + + + + +RangePartitioner +publicRangePartitioner(intpartitions, +RDD? extends scala.Product2K,Vrdd, +booleanascending, +scala.math.OrderingKevidence$1, +scala.reflect.ClassTagKevidence$2) + + + + + + + + + +Method Detail + + + + + +sketch +public staticKscala.Tuple2Object,scala.Tuple3Object,Object,Object[]sketch(RDDKrdd, + intsampleSizePerPartition, + scala.reflect.ClassTagKevidence$3) +Sketches the input RDD via reservoir sampling on each partition. + +Parameters:rdd - the input RDD to sketchsampleSizePerPartition - max sample size per partitionevidence$3 - (undocumented) +Returns:(total number of items, an array of (partitionId, number of items, sample)) + + + + + + + +determineBounds +public staticKObjectdetermineBounds(scala.collection.mutable.ArrayBufferscala.Tuple2K,Objectcandidates, + intpartitions, + scala.math.OrderingKevidence$4, + scala.reflect.ClassTagKevidence$5) +Determines the bounds for range partitioning from candidates with weights indicating how many + items each represents. Usually this is 1 over the probability used to sample this candidate. + +Parameters:candidates - unordered candidates with weightspartitions - number of partitionsevidence$4 - (undocumented)evidence$5 - (undocumented) +Returns:selected bounds + + + + + + + +numPartitions +publicintnumPartitions() + +Specified by: +numPartitionsin classPartitioner + + + + + + + + +getPartition +publicintgetPartition(Objectkey) +
[05/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/SparkEnv.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/SparkEnv.html b/site/docs/2.0.2/api/java/org/apache/spark/SparkEnv.html new file mode 100644 index 000..9bcf96c --- /dev/null +++ b/site/docs/2.0.2/api/java/org/apache/spark/SparkEnv.html @@ -0,0 +1,474 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +SparkEnv (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + +org.apache.spark +Class SparkEnv + + + +Object + + +org.apache.spark.SparkEnv + + + + + + + + +public class SparkEnv +extends Object +:: DeveloperApi :: + Holds all the runtime environment objects for a running Spark instance (either master or worker), + including the serializer, RpcEnv, block manager, map output tracker, etc. Currently + Spark code finds the SparkEnv through a global variable, so all the threads can access the same + SparkEnv. It can be accessed by SparkEnv.get (e.g. after creating a SparkContext). + + NOTE: This is not intended for external use. This is exposed for Shark and may be made private + in a future release. + + + + + + + + + + + +Constructor Summary + +Constructors + +Constructor and Description + + +SparkEnv(StringexecutorId, +org.apache.spark.rpc.RpcEnvrpcEnv, +Serializerserializer, +SerializerclosureSerializer, +org.apache.spark.serializer.SerializerManagerserializerManager, +org.apache.spark.MapOutputTrackermapOutputTracker, +org.apache.spark.shuffle.ShuffleManagershuffleManager, +org.apache.spark.broadcast.BroadcastManagerbroadcastManager, +org.apache.spark.storage.BlockManagerblockManager, +org.apache.spark.SecurityManagersecurityManager, +org.apache.spark.metrics.MetricsSystemmetricsSystem, +org.apache.spark.memory.MemoryManagermemoryManager, + org.apache.spark.scheduler.OutputCommitCoordinatoroutputCommitCoordinator, +SparkConfconf) + + + + + + + + + +Method Summary + +Methods + +Modifier and Type +Method and Description + + +org.apache.spark.storage.BlockManager +blockManager() + + +org.apache.spark.broadcast.BroadcastManager +broadcastManager() + + +Serializer +closureSerializer() + + +SparkConf +conf() + + +String +executorId() + + +static SparkEnv +get() +Returns the SparkEnv. + + + +org.apache.spark.MapOutputTracker +mapOutputTracker() + + +org.apache.spark.memory.MemoryManager +memoryManager() + + +org.apache.spark.metrics.MetricsSystem +metricsSystem() + + +org.apache.spark.scheduler.OutputCommitCoordinator +outputCommitCoordinator() + + +org.apache.spark.SecurityManager +securityManager() + + +Serializer +serializer() + + +org.apache.spark.serializer.SerializerManager +serializerManager() + + +static void +set(SparkEnve) + + +org.apache.spark.shuffle.ShuffleManager +shuffleManager() + + + + + + +Methods inherited from classObject +equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait + + + + + + + + + + + + + + +Constructor Detail + + + + + +SparkEnv +publicSparkEnv(StringexecutorId, +org.apache.spark.rpc.RpcEnvrpcEnv, +Serializerserializer, +SerializerclosureSerializer, +org.apache.spark.serializer.SerializerManagerserializerManager, +org.apache.spark.MapOutputTrackermapOutputTracker, +org.apache.spark.shuffle.ShuffleManagershuffleManager, +org.apache.spark.broadcast.BroadcastManagerbroadcastManager, +org.apache.spark.storage.BlockManagerblockManager, +org.apache.spark.SecurityManagersecurityManager, +org.apache.spark.metrics.MetricsSystemmetricsSystem, +org.apache.spark.memory.MemoryManagermemoryManager, + org.apache.spark.scheduler.OutputCommitCoordinatoroutputCommitCoordinator, +SparkConfconf) + + + + + + + + + +Method Detail + + + + + +set +public staticvoidset(SparkEnve) + + + + + + + +get +public staticSparkEnvget() +Returns the SparkEnv. +Returns:(undocumented) + + + + + + + +executorId +publicStringexecutorId() + + + + + + + +serializer +publicSerializerserializer() + + + + + + + +closureSerializer +publicSerializerclosureSerializer() + + + + + + + +serializerManager
[43/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/dim.html -- diff --git a/site/docs/2.0.2/api/R/dim.html b/site/docs/2.0.2/api/R/dim.html new file mode 100644 index 000..1227bc6 --- /dev/null +++ b/site/docs/2.0.2/api/R/dim.html @@ -0,0 +1,256 @@ + +R: Returns the dimensions of SparkDataFrame + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +dim {SparkR}R Documentation + +Returns the dimensions of SparkDataFrame + +Description + +Returns the dimensions (number of rows and columns) of a SparkDataFrame + + + +Usage + + +## S4 method for signature 'SparkDataFrame' +dim(x) + + + +Arguments + + +x + +a SparkDataFrame + + + + +Note + +dim since 1.5.0 + + + +See Also + +Other SparkDataFrame functions: $, +$,SparkDataFrame-method, $-, +$-,SparkDataFrame-method, +select, select, +select,SparkDataFrame,Column-method, +select,SparkDataFrame,character-method, +select,SparkDataFrame,list-method; +SparkDataFrame-class; [, +[,SparkDataFrame-method, [[, +[[,SparkDataFrame,numericOrcharacter-method, +subset, subset, +subset,SparkDataFrame-method; +agg, agg, agg, +agg,GroupedData-method, +agg,SparkDataFrame-method, +summarize, summarize, +summarize, +summarize,GroupedData-method, +summarize,SparkDataFrame-method; +arrange, arrange, +arrange, +arrange,SparkDataFrame,Column-method, +arrange,SparkDataFrame,character-method, +orderBy,SparkDataFrame,characterOrColumn-method; +as.data.frame, +as.data.frame,SparkDataFrame-method; +attach, +attach,SparkDataFrame-method; +cache, cache, +cache,SparkDataFrame-method; +collect, collect, +collect,SparkDataFrame-method; +colnames, colnames, +colnames,SparkDataFrame-method, +colnames-, colnames-, +colnames-,SparkDataFrame-method, +columns, columns, +columns,SparkDataFrame-method, +names, +names,SparkDataFrame-method, +names-, +names-,SparkDataFrame-method; +coltypes, coltypes, +coltypes,SparkDataFrame-method, +coltypes-, coltypes-, +coltypes-,SparkDataFrame,character-method; +count,SparkDataFrame-method, +nrow, nrow, +nrow,SparkDataFrame-method; +createOrReplaceTempView, +createOrReplaceTempView, +createOrReplaceTempView,SparkDataFrame,character-method; +dapplyCollect, dapplyCollect, +dapplyCollect,SparkDataFrame,function-method; +dapply, dapply, +dapply,SparkDataFrame,function,structType-method; +describe, describe, +describe, +describe,SparkDataFrame,ANY-method, +describe,SparkDataFrame,character-method, +describe,SparkDataFrame-method, +summary, summary, +summary,SparkDataFrame-method; +distinct, distinct, +distinct,SparkDataFrame-method, +unique, +unique,SparkDataFrame-method; +dropDuplicates, +dropDuplicates, +dropDuplicates,SparkDataFrame-method; +dropna, dropna, +dropna,SparkDataFrame-method, +fillna, fillna, +fillna,SparkDataFrame-method, +na.omit, na.omit, +na.omit,SparkDataFrame-method; +drop, drop, +drop, drop,ANY-method, +drop,SparkDataFrame-method; +dtypes, dtypes, +dtypes,SparkDataFrame-method; +except, except, +except,SparkDataFrame,SparkDataFrame-method; +explain, explain, +explain,SparkDataFrame-method; +filter, filter, +filter,SparkDataFrame,characterOrColumn-method, +where, where, +where,SparkDataFrame,characterOrColumn-method; +first, first, +first, +first,SparkDataFrame-method, +first,characterOrColumn-method; +gapplyCollect, gapplyCollect, +gapplyCollect, +gapplyCollect,GroupedData-method, +gapplyCollect,SparkDataFrame-method; +gapply, gapply, +gapply, +gapply,GroupedData-method, +gapply,SparkDataFrame-method; +groupBy, groupBy, +groupBy,SparkDataFrame-method, +group_by, group_by, +group_by,SparkDataFrame-method; +head, +head,SparkDataFrame-method; +histogram, +histogram,SparkDataFrame,characterOrColumn-method; +insertInto, insertInto, +insertInto,SparkDataFrame,character-method; +intersect, intersect, +intersect,SparkDataFrame,SparkDataFrame-method; +isLocal, isLocal, +isLocal,SparkDataFrame-method; +join, +join,SparkDataFrame,SparkDataFrame-method; +limit, limit, +limit,SparkDataFrame,numeric-method; +merge, merge, +merge,SparkDataFrame,SparkDataFrame-method; +mutate, mutate, +mutate,SparkDataFrame-method, +transform, transform, +transform,SparkDataFrame-method; +ncol, +ncol,SparkDataFrame-method; +persist, persist, +persist,SparkDataFrame,character-method; +printSchema, printSchema, +printSchema,SparkDataFrame-method; +randomSplit, randomSplit, +randomSplit,SparkDataFrame,numeric-method; +rbind, rbind, +rbind,SparkDataFrame-method; +registerTempTable, +registerTempTable, +registerTempTable,SparkDataFrame,character-method; +rename, rename, +rename,SparkDataFrame-method, +withColumnRenamed, +withColumnRenamed, +withColumnRenamed,SparkDataFrame,character,character-method; +repartition, repartition,
[31/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/round.html -- diff --git a/site/docs/2.0.2/api/R/round.html b/site/docs/2.0.2/api/R/round.html new file mode 100644 index 000..fd2fb17 --- /dev/null +++ b/site/docs/2.0.2/api/R/round.html @@ -0,0 +1,120 @@ + +R: round + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +round {SparkR}R Documentation + +round + +Description + +Returns the value of the column e rounded to 0 decimal places using HALF_UP rounding mode. + + + +Usage + + +## S4 method for signature 'Column' +round(x) + + + +Arguments + + +x + +Column to compute on. + + + + +Note + +round since 1.5.0 + + + +See Also + +Other math_funcs: acos, +acos,Column-method; asin, +asin,Column-method; atan2, +atan2,Column-method; atan, +atan,Column-method; bin, +bin, bin,Column-method; +bround, bround, +bround,Column-method; cbrt, +cbrt, cbrt,Column-method; +ceil, ceil, +ceil,Column-method, ceiling, +ceiling,Column-method; conv, +conv, +conv,Column,numeric,numeric-method; +corr, corr, +corr, corr,Column-method, +corr,SparkDataFrame-method; +cosh, cosh,Column-method; +cos, cos,Column-method; +covar_pop, covar_pop, +covar_pop,characterOrColumn,characterOrColumn-method; +cov, cov, cov, +cov,SparkDataFrame-method, +cov,characterOrColumn-method, +covar_samp, covar_samp, +covar_samp,characterOrColumn,characterOrColumn-method; +expm1, expm1,Column-method; +exp, exp,Column-method; +factorial, +factorial,Column-method; +floor, floor,Column-method; +hex, hex, +hex,Column-method; hypot, +hypot, hypot,Column-method; +log10, log10,Column-method; +log1p, log1p,Column-method; +log2, log2,Column-method; +log, log,Column-method; +pmod, pmod, +pmod,Column-method; rint, +rint, rint,Column-method; +shiftLeft, shiftLeft, +shiftLeft,Column,numeric-method; +shiftRightUnsigned, +shiftRightUnsigned, +shiftRightUnsigned,Column,numeric-method; +shiftRight, shiftRight, +shiftRight,Column,numeric-method; +sign, sign,Column-method, +signum, signum, +signum,Column-method; sinh, +sinh,Column-method; sin, +sin,Column-method; sqrt, +sqrt,Column-method; tanh, +tanh,Column-method; tan, +tan,Column-method; toDegrees, +toDegrees, +toDegrees,Column-method; +toRadians, toRadians, +toRadians,Column-method; +unhex, unhex, +unhex,Column-method + + + +Examples + +## Not run: round(df$c) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/row_number.html -- diff --git a/site/docs/2.0.2/api/R/row_number.html b/site/docs/2.0.2/api/R/row_number.html new file mode 100644 index 000..d3fd4fb --- /dev/null +++ b/site/docs/2.0.2/api/R/row_number.html @@ -0,0 +1,86 @@ + +R: row_number + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +row_number {SparkR}R Documentation + +row_number + +Description + +Window function: returns a sequential number starting at 1 within a window partition. + + + +Usage + + +## S4 method for signature 'missing' +row_number() + +row_number(x = "missing") + + + +Arguments + + +x + +empty. Should be used with no argument. + + + + +Details + +This is equivalent to the ROW_NUMBER function in SQL. + + + +Note + +row_number since 1.6.0 + + + +See Also + +Other window_funcs: cume_dist, +cume_dist, +cume_dist,missing-method; +dense_rank, dense_rank, +dense_rank,missing-method; +lag, lag, +lag,characterOrColumn-method; +lead, lead, +lead,characterOrColumn,numeric-method; +ntile, ntile, +ntile,numeric-method; +percent_rank, percent_rank, +percent_rank,missing-method; +rank, rank, +rank, rank,ANY-method, +rank,missing-method + + + +Examples + +## Not run: +##D df - createDataFrame(mtcars) +##D ws - orderBy(windowPartitionBy(am), hp) +##D out - select(df, over(row_number(), ws), df$hp, df$am) +## End(Not run) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/rowsBetween.html -- diff --git a/site/docs/2.0.2/api/R/rowsBetween.html b/site/docs/2.0.2/api/R/rowsBetween.html new file mode 100644 index 000..571df1a --- /dev/null +++ b/site/docs/2.0.2/api/R/rowsBetween.html @@ -0,0 +1,94 @@ + +R: rowsBetween + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
[25/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/unhex.html -- diff --git a/site/docs/2.0.2/api/R/unhex.html b/site/docs/2.0.2/api/R/unhex.html new file mode 100644 index 000..208ff4c --- /dev/null +++ b/site/docs/2.0.2/api/R/unhex.html @@ -0,0 +1,122 @@ + +R: unhex + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +unhex {SparkR}R Documentation + +unhex + +Description + +Inverse of hex. Interprets each pair of characters as a hexadecimal number +and converts to the byte representation of number. + + + +Usage + + +## S4 method for signature 'Column' +unhex(x) + +unhex(x) + + + +Arguments + + +x + +Column to compute on. + + + + +Note + +unhex since 1.5.0 + + + +See Also + +Other math_funcs: acos, +acos,Column-method; asin, +asin,Column-method; atan2, +atan2,Column-method; atan, +atan,Column-method; bin, +bin, bin,Column-method; +bround, bround, +bround,Column-method; cbrt, +cbrt, cbrt,Column-method; +ceil, ceil, +ceil,Column-method, ceiling, +ceiling,Column-method; conv, +conv, +conv,Column,numeric,numeric-method; +corr, corr, +corr, corr,Column-method, +corr,SparkDataFrame-method; +cosh, cosh,Column-method; +cos, cos,Column-method; +covar_pop, covar_pop, +covar_pop,characterOrColumn,characterOrColumn-method; +cov, cov, cov, +cov,SparkDataFrame-method, +cov,characterOrColumn-method, +covar_samp, covar_samp, +covar_samp,characterOrColumn,characterOrColumn-method; +expm1, expm1,Column-method; +exp, exp,Column-method; +factorial, +factorial,Column-method; +floor, floor,Column-method; +hex, hex, +hex,Column-method; hypot, +hypot, hypot,Column-method; +log10, log10,Column-method; +log1p, log1p,Column-method; +log2, log2,Column-method; +log, log,Column-method; +pmod, pmod, +pmod,Column-method; rint, +rint, rint,Column-method; +round, round,Column-method; +shiftLeft, shiftLeft, +shiftLeft,Column,numeric-method; +shiftRightUnsigned, +shiftRightUnsigned, +shiftRightUnsigned,Column,numeric-method; +shiftRight, shiftRight, +shiftRight,Column,numeric-method; +sign, sign,Column-method, +signum, signum, +signum,Column-method; sinh, +sinh,Column-method; sin, +sin,Column-method; sqrt, +sqrt,Column-method; tanh, +tanh,Column-method; tan, +tan,Column-method; toDegrees, +toDegrees, +toDegrees,Column-method; +toRadians, toRadians, +toRadians,Column-method + + + +Examples + +## Not run: unhex(df$c) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/union.html -- diff --git a/site/docs/2.0.2/api/R/union.html b/site/docs/2.0.2/api/R/union.html new file mode 100644 index 000..8e92124 --- /dev/null +++ b/site/docs/2.0.2/api/R/union.html @@ -0,0 +1,280 @@ + +R: Return a new SparkDataFrame containing the union of rows + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +union {SparkR}R Documentation + +Return a new SparkDataFrame containing the union of rows + +Description + +Return a new SparkDataFrame containing the union of rows in this SparkDataFrame +and another SparkDataFrame. This is equivalent to UNION ALL in SQL. +Note that this does not remove duplicate rows across the two SparkDataFrames. + +unionAll is deprecated - use union instead + + + +Usage + + +## S4 method for signature 'SparkDataFrame,SparkDataFrame' +union(x, y) + +## S4 method for signature 'SparkDataFrame,SparkDataFrame' +unionAll(x, y) + +union(x, y) + +unionAll(x, y) + + + +Arguments + + +x + +A SparkDataFrame + +y + +A SparkDataFrame + + + + +Value + +A SparkDataFrame containing the result of the union. + + + +Note + +union since 2.0.0 + +unionAll since 1.4.0 + + + +See Also + +rbind + +Other SparkDataFrame functions: $, +$,SparkDataFrame-method, $-, +$-,SparkDataFrame-method, +select, select, +select,SparkDataFrame,Column-method, +select,SparkDataFrame,character-method, +select,SparkDataFrame,list-method; +SparkDataFrame-class; [, +[,SparkDataFrame-method, [[, +[[,SparkDataFrame,numericOrcharacter-method, +subset, subset, +subset,SparkDataFrame-method; +agg, agg, agg, +agg,GroupedData-method, +agg,SparkDataFrame-method, +summarize, summarize, +summarize, +summarize,GroupedData-method, +summarize,SparkDataFrame-method; +arrange, arrange, +arrange, +arrange,SparkDataFrame,Column-method, +arrange,SparkDataFrame,character-method, +orderBy,SparkDataFrame,characterOrColumn-method; +as.data.frame,
[06/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/SparkContext.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/SparkContext.html b/site/docs/2.0.2/api/java/org/apache/spark/SparkContext.html new file mode 100644 index 000..09f31f9 --- /dev/null +++ b/site/docs/2.0.2/api/java/org/apache/spark/SparkContext.html @@ -0,0 +1,2467 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +SparkContext (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + +org.apache.spark +Class SparkContext + + + +Object + + +org.apache.spark.SparkContext + + + + + + + + +public class SparkContext +extends Object +Main entry point for Spark functionality. A SparkContext represents the connection to a Spark + cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster. + + Only one SparkContext may be active per JVM. You must stop() the active SparkContext before + creating a new one. This limitation may eventually be removed; see SPARK-2243 for more details. + + param: config a Spark Config object describing the application configuration. Any settings in + this config overrides the default configs as well as system properties. + + + + + + + + + + + +Constructor Summary + +Constructors + +Constructor and Description + + +SparkContext() +Create a SparkContext that loads settings from system properties (for instance, when + launching with ./bin/spark-submit). + + + +SparkContext(SparkConfconfig) + + +SparkContext(Stringmaster, +StringappName, +SparkConfconf) +Alternative constructor that allows setting common Spark properties directly + + + +SparkContext(Stringmaster, +StringappName, +StringsparkHome, +scala.collection.SeqStringjars, +scala.collection.MapString,Stringenvironment) +Alternative constructor that allows setting common Spark properties directly + + + + + + + + + + +Method Summary + +Methods + +Modifier and Type +Method and Description + + +R,TAccumulableR,T +accumulable(RinitialValue, + AccumulableParamR,Tparam) +Deprecated. +use AccumulatorV2. Since 2.0.0. + + + + +R,TAccumulableR,T +accumulable(RinitialValue, + Stringname, + AccumulableParamR,Tparam) +Deprecated. +use AccumulatorV2. Since 2.0.0. + + + + +R,TAccumulableR,T +accumulableCollection(RinitialValue, + scala.Function1R,scala.collection.generic.GrowableTevidence$9, + scala.reflect.ClassTagRevidence$10) +Deprecated. +use AccumulatorV2. Since 2.0.0. + + + + +TAccumulatorT +accumulator(TinitialValue, + AccumulatorParamTparam) +Deprecated. +use AccumulatorV2. Since 2.0.0. + + + + +TAccumulatorT +accumulator(TinitialValue, + Stringname, + AccumulatorParamTparam) +Deprecated. +use AccumulatorV2. Since 2.0.0. + + + + +void +addFile(Stringpath) +Add a file to be downloaded with this Spark job on every node. + + + +void +addFile(Stringpath, + booleanrecursive) +Add a file to be downloaded with this Spark job on every node. + + + +void +addJar(Stringpath) +Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. + + + +void +addSparkListener(org.apache.spark.scheduler.SparkListenerInterfacelistener) +:: DeveloperApi :: + Register a listener to receive up-calls from events that happen during execution. + + + +scala.OptionString +applicationAttemptId() + + +String +applicationId() +A unique identifier for the Spark application. + + + +String +appName() + + +RDDscala.Tuple2String,PortableDataStream +binaryFiles(Stringpath, + intminPartitions) +Get an RDD for a Hadoop-readable dataset as PortableDataStream for each file + (useful for binary data) + + + +RDDbyte[] +binaryRecords(Stringpath, + intrecordLength, + org.apache.hadoop.conf.Configurationconf) +Load data from a flat binary file, assuming the length of each record is constant. + + + +TBroadcastT +broadcast(Tvalue, + scala.reflect.ClassTagTevidence$11) +Broadcast a read-only variable to the cluster, returning a + Broadcast object for reading it in distributed functions. + + + +void +cancelAllJobs() +Cancel all jobs that have been scheduled or are
[45/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/corr.html -- diff --git a/site/docs/2.0.2/api/R/corr.html b/site/docs/2.0.2/api/R/corr.html new file mode 100644 index 000..9f058a4 --- /dev/null +++ b/site/docs/2.0.2/api/R/corr.html @@ -0,0 +1,177 @@ + +R: corr + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +corr {SparkR}R Documentation + +corr + +Description + +Computes the Pearson Correlation Coefficient for two Columns. + +Calculates the correlation of two columns of a SparkDataFrame. +Currently only supports the Pearson Correlation Coefficient. +For Spearman Correlation, consider using RDD methods found in MLlib's Statistics. + + + +Usage + + +## S4 method for signature 'Column' +corr(x, col2) + +corr(x, ...) + +## S4 method for signature 'SparkDataFrame' +corr(x, colName1, colName2, method = "pearson") + + + +Arguments + + +x + +a Column or a SparkDataFrame. + +col2 + +a (second) Column. + +... + +additional argument(s). If x is a Column, a Column +should be provided. If x is a SparkDataFrame, two column names should +be provided. + +colName1 + +the name of the first column + +colName2 + +the name of the second column + +method + +Optional. A character specifying the method for calculating the correlation. +only pearson is allowed now. + + + + +Value + +The Pearson Correlation Coefficient as a Double. + + + +Note + +corr since 1.6.0 + +corr since 1.6.0 + + + +See Also + +Other math_funcs: acos, +acos,Column-method; asin, +asin,Column-method; atan2, +atan2,Column-method; atan, +atan,Column-method; bin, +bin, bin,Column-method; +bround, bround, +bround,Column-method; cbrt, +cbrt, cbrt,Column-method; +ceil, ceil, +ceil,Column-method, ceiling, +ceiling,Column-method; conv, +conv, +conv,Column,numeric,numeric-method; +cosh, cosh,Column-method; +cos, cos,Column-method; +covar_pop, covar_pop, +covar_pop,characterOrColumn,characterOrColumn-method; +cov, cov, cov, +cov,SparkDataFrame-method, +cov,characterOrColumn-method, +covar_samp, covar_samp, +covar_samp,characterOrColumn,characterOrColumn-method; +expm1, expm1,Column-method; +exp, exp,Column-method; +factorial, +factorial,Column-method; +floor, floor,Column-method; +hex, hex, +hex,Column-method; hypot, +hypot, hypot,Column-method; +log10, log10,Column-method; +log1p, log1p,Column-method; +log2, log2,Column-method; +log, log,Column-method; +pmod, pmod, +pmod,Column-method; rint, +rint, rint,Column-method; +round, round,Column-method; +shiftLeft, shiftLeft, +shiftLeft,Column,numeric-method; +shiftRightUnsigned, +shiftRightUnsigned, +shiftRightUnsigned,Column,numeric-method; +shiftRight, shiftRight, +shiftRight,Column,numeric-method; +sign, sign,Column-method, +signum, signum, +signum,Column-method; sinh, +sinh,Column-method; sin, +sin,Column-method; sqrt, +sqrt,Column-method; tanh, +tanh,Column-method; tan, +tan,Column-method; toDegrees, +toDegrees, +toDegrees,Column-method; +toRadians, toRadians, +toRadians,Column-method; +unhex, unhex, +unhex,Column-method + +Other stat functions: approxQuantile, +approxQuantile,SparkDataFrame,character,numeric,numeric-method; +cov, cov, cov, +cov,SparkDataFrame-method, +cov,characterOrColumn-method, +covar_samp, covar_samp, +covar_samp,characterOrColumn,characterOrColumn-method; +crosstab, +crosstab,SparkDataFrame,character,character-method; +freqItems, +freqItems,SparkDataFrame,character-method; +sampleBy, sampleBy, +sampleBy,SparkDataFrame,character,list,numeric-method + + + +Examples + +## Not run: corr(df$c, df$d) +## Not run: +##D df - read.json(/path/to/file.json) +##D corr - corr(df, title, gender) +##D corr - corr(df, title, gender, method = pearson) +## End(Not run) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/cos.html -- diff --git a/site/docs/2.0.2/api/R/cos.html b/site/docs/2.0.2/api/R/cos.html new file mode 100644 index 000..64090a4 --- /dev/null +++ b/site/docs/2.0.2/api/R/cos.html @@ -0,0 +1,120 @@ + +R: cos + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +cos {SparkR}R Documentation + +cos + +Description + +Computes the cosine of the given value. + + + +Usage + + +## S4 method for signature 'Column' +cos(x) + + + +Arguments + + +x + +Column to compute on. + + + + +Note + +cos since 1.5.0 + + + +See Also + +Other math_funcs: acos,
[07/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/SparkConf.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/SparkConf.html b/site/docs/2.0.2/api/java/org/apache/spark/SparkConf.html new file mode 100644 index 000..8fb676c --- /dev/null +++ b/site/docs/2.0.2/api/java/org/apache/spark/SparkConf.html @@ -0,0 +1,1124 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +SparkConf (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + +org.apache.spark +Class SparkConf + + + +Object + + +org.apache.spark.SparkConf + + + + + + + +All Implemented Interfaces: +Cloneable + + + +public class SparkConf +extends Object +implements scala.Cloneable +Configuration for a Spark application. Used to set various Spark parameters as key-value pairs. + + Most of the time, you would create a SparkConf object with new SparkConf(), which will load + values from any spark.* Java system properties set in your application as well. In this case, + parameters you set directly on the SparkConf object take priority over system properties. + + For unit tests, you can also call new SparkConf(false) to skip loading external settings and + get the same configuration no matter what the system properties are. + + All setter methods in this class support chaining. For example, you can write + new SparkConf().setMaster("local").setAppName("My app"). + + Note that once a SparkConf object is passed to Spark, it is cloned and can no longer be modified + by the user. Spark does not support modifying the configuration at runtime. + + param: loadDefaults whether to also load values from Java system properties + + + + + + + + + + + +Constructor Summary + +Constructors + +Constructor and Description + + +SparkConf() +Create a SparkConf that loads defaults from system properties and the classpath + + + +SparkConf(booleanloadDefaults) + + + + + + + + + +Method Summary + +Methods + +Modifier and Type +Method and Description + + +SparkConf +clone() +Copy this object + + + +boolean +contains(Stringkey) +Does the configuration contain a given parameter? + + + +String +get(Stringkey) +Get a parameter; throws a NoSuchElementException if it's not set + + + +String +get(Stringkey, + StringdefaultValue) +Get a parameter, falling back to a default if not set + + + +scala.Tuple2String,String[] +getAll() +Get all parameters as a list of pairs + + + +String +getAppId() +Returns the Spark application id, valid in the Driver after TaskScheduler registration and + from the start in the Executor. + + + +scala.collection.immutable.MapObject,String +getAvroSchema() +Gets all the avro schemas in the configuration used in the generic Avro record serializer + + + +boolean +getBoolean(Stringkey, + booleandefaultValue) +Get a parameter as a boolean, falling back to a default if not set + + + +static scala.OptionString +getDeprecatedConfig(Stringkey, + SparkConfconf) +Looks for available deprecated keys for the given config option, and return the first + value available. + + + +double +getDouble(Stringkey, + doubledefaultValue) +Get a parameter as a double, falling back to a default if not set + + + +scala.collection.Seqscala.Tuple2String,String +getExecutorEnv() +Get all executor environment variables set on this SparkConf + + + +int +getInt(Stringkey, + intdefaultValue) +Get a parameter as an integer, falling back to a default if not set + + + +long +getLong(Stringkey, + longdefaultValue) +Get a parameter as a long, falling back to a default if not set + + + +scala.OptionString +getOption(Stringkey) +Get a parameter as an Option + + + +long +getSizeAsBytes(Stringkey) +Get a size parameter as bytes; throws a NoSuchElementException if it's not set. + + + +long +getSizeAsBytes(Stringkey, + longdefaultValue) +Get a size parameter as bytes, falling back to a default if not set. + + + +long +getSizeAsBytes(Stringkey, + StringdefaultValue) +Get a size parameter as bytes, falling back to a default if not set. + + + +long +getSizeAsGb(Stringkey) +Get a size parameter as Gibibytes; throws a NoSuchElementException if it's not set. + + + +long +getSizeAsGb(Stringkey, + StringdefaultValue) +Get a size parameter as
[14/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/AccumulatorParam.IntAccumulatorParam$.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/AccumulatorParam.IntAccumulatorParam$.html b/site/docs/2.0.2/api/java/org/apache/spark/AccumulatorParam.IntAccumulatorParam$.html new file mode 100644 index 000..c77f232 --- /dev/null +++ b/site/docs/2.0.2/api/java/org/apache/spark/AccumulatorParam.IntAccumulatorParam$.html @@ -0,0 +1,361 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +AccumulatorParam.IntAccumulatorParam$ (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + +org.apache.spark +Class AccumulatorParam.IntAccumulatorParam$ + + + +Object + + +org.apache.spark.AccumulatorParam.IntAccumulatorParam$ + + + + + + + +All Implemented Interfaces: +java.io.Serializable, AccumulableParamObject,Object, AccumulatorParamObject + + +Enclosing interface: +AccumulatorParamT + + +Deprecated. +use AccumulatorV2. Since 2.0.0. + + +public static class AccumulatorParam.IntAccumulatorParam$ +extends Object +implements AccumulatorParamObject +See Also:Serialized Form + + + + + + + + + + + +Nested Class Summary + + + + +Nested classes/interfaces inherited from interfaceorg.apache.spark.AccumulatorParam +AccumulatorParam.DoubleAccumulatorParam$, AccumulatorParam.FloatAccumulatorParam$, AccumulatorParam.IntAccumulatorParam$, AccumulatorParam.LongAccumulatorParam$, AccumulatorParam.StringAccumulatorParam$ + + + + + + + + +Field Summary + +Fields + +Modifier and Type +Field and Description + + +static AccumulatorParam.IntAccumulatorParam$ +MODULE$ +Deprecated. +Static reference to the singleton instance of this Scala object. + + + + + + + + + + +Constructor Summary + +Constructors + +Constructor and Description + + +AccumulatorParam.IntAccumulatorParam$() +Deprecated. + + + + + + + + + + +Method Summary + +Methods + +Modifier and Type +Method and Description + + +int +addInPlace(intt1, + intt2) +Deprecated. + + + +int +zero(intinitialValue) +Deprecated. + + + + + + + +Methods inherited from classObject +equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait + + + + + +Methods inherited from interfaceorg.apache.spark.AccumulatorParam +addAccumulator + + + + + +Methods inherited from interfaceorg.apache.spark.AccumulableParam +addInPlace, zero + + + + + + + + + + + + + + +Field Detail + + + + + +MODULE$ +public static finalAccumulatorParam.IntAccumulatorParam$ MODULE$ +Deprecated. +Static reference to the singleton instance of this Scala object. + + + + + + + + + +Constructor Detail + + + + + +AccumulatorParam.IntAccumulatorParam$ +publicAccumulatorParam.IntAccumulatorParam$() +Deprecated. + + + + + + + + + +Method Detail + + + + + +addInPlace +publicintaddInPlace(intt1, + intt2) +Deprecated. + + + + + + + +zero +publicintzero(intinitialValue) +Deprecated. + + + + + + + + + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/AccumulatorParam.LongAccumulatorParam$.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/AccumulatorParam.LongAccumulatorParam$.html b/site/docs/2.0.2/api/java/org/apache/spark/AccumulatorParam.LongAccumulatorParam$.html new file mode 100644 index 000..d39926f --- /dev/null +++ b/site/docs/2.0.2/api/java/org/apache/spark/AccumulatorParam.LongAccumulatorParam$.html @@ -0,0 +1,361 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +AccumulatorParam.LongAccumulatorParam$ (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + +org.apache.spark +Class ExecutorRemoved + + + +Object + + +org.apache.spark.ExecutorRemoved + + + + + + + +All Implemented Interfaces: +java.io.Serializable, scala.Equals, scala.Product + + + +public class ExecutorRemoved +extends Object +implements scala.Product, scala.Serializable +See Also:Serialized Form + + + + + + + + + + + +Constructor Summary + +Constructors + +Constructor and Description + + +ExecutorRemoved(StringexecutorId) + + + + + + + + + +Method Summary + +Methods + +Modifier and Type +Method and Description + + +abstract static boolean +canEqual(Objectthat) + + +abstract static boolean +equals(Objectthat) + + +String +executorId() + + +abstract static int +productArity() + + +abstract static Object +productElement(intn) + + +static scala.collection.IteratorObject +productIterator() + + +static String +productPrefix() + + + + + + +Methods inherited from classObject +equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait + + + + + +Methods inherited from interfacescala.Product +productArity, productElement, productIterator, productPrefix + + + + + +Methods inherited from interfacescala.Equals +canEqual, equals + + + + + + + + + + + + + + +Constructor Detail + + + + + +ExecutorRemoved +publicExecutorRemoved(StringexecutorId) + + + + + + + + + +Method Detail + + + + + +canEqual +public abstract staticbooleancanEqual(Objectthat) + + + + + + + +equals +public abstract staticbooleanequals(Objectthat) + + + + + + + +productElement +public abstract staticObjectproductElement(intn) + + + + + + + +productArity +public abstract staticintproductArity() + + + + + + + +productIterator +public staticscala.collection.IteratorObjectproductIterator() + + + + + + + +productPrefix +public staticStringproductPrefix() + + + + + + + +executorId +publicStringexecutorId() + + + + + + + + + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/ExpireDeadHosts.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/ExpireDeadHosts.html b/site/docs/2.0.2/api/java/org/apache/spark/ExpireDeadHosts.html new file mode 100644 index 000..492fd65 --- /dev/null +++ b/site/docs/2.0.2/api/java/org/apache/spark/ExpireDeadHosts.html @@ -0,0 +1,319 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +ExpireDeadHosts (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + +org.apache.spark +Class ExpireDeadHosts + + + +Object + + +org.apache.spark.ExpireDeadHosts + + + + + + + + +public class ExpireDeadHosts +extends Object + + + + + + + + + + + +Constructor
[01/51] [partial] spark-website git commit: Add docs for 2.0.2.
Repository: spark-website Updated Branches: refs/heads/asf-site b9aa4c3ee -> 0bd363165 http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/UnknownReason.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/UnknownReason.html b/site/docs/2.0.2/api/java/org/apache/spark/UnknownReason.html new file mode 100644 index 000..5197a6b --- /dev/null +++ b/site/docs/2.0.2/api/java/org/apache/spark/UnknownReason.html @@ -0,0 +1,348 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +UnknownReason (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + +org.apache.spark +Class UnknownReason + + + +Object + + +org.apache.spark.UnknownReason + + + + + + + + +public class UnknownReason +extends Object +:: DeveloperApi :: + We don't know why the task ended -- for example, because of a ClassNotFound exception when + deserializing the task result. + + + + + + + + + + + +Constructor Summary + +Constructors + +Constructor and Description + + +UnknownReason() + + + + + + + + + +Method Summary + +Methods + +Modifier and Type +Method and Description + + +abstract static boolean +canEqual(Objectthat) + + +static boolean +countTowardsTaskFailures() + + +abstract static boolean +equals(Objectthat) + + +abstract static int +productArity() + + +abstract static Object +productElement(intn) + + +static scala.collection.IteratorObject +productIterator() + + +static String +productPrefix() + + +static String +toErrorString() + + + + + + +Methods inherited from classObject +equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait + + + + + + + + + + + + + + +Constructor Detail + + + + + +UnknownReason +publicUnknownReason() + + + + + + + + + +Method Detail + + + + + +toErrorString +public staticStringtoErrorString() + + + + + + + +countTowardsTaskFailures +public staticbooleancountTowardsTaskFailures() + + + + + + + +canEqual +public abstract staticbooleancanEqual(Objectthat) + + + + + + + +equals +public abstract staticbooleanequals(Objectthat) + + + + + + + +productElement +public abstract staticObjectproductElement(intn) + + + + + + + +productArity +public abstract staticintproductArity() + + + + + + + +productIterator +public staticscala.collection.IteratorObjectproductIterator() + + + + + + + +productPrefix +public staticStringproductPrefix() + + + + + + + + + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[02/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/TaskKilled.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/TaskKilled.html b/site/docs/2.0.2/api/java/org/apache/spark/TaskKilled.html new file mode 100644 index 000..976143b --- /dev/null +++ b/site/docs/2.0.2/api/java/org/apache/spark/TaskKilled.html @@ -0,0 +1,347 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +TaskKilled (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + +org.apache.spark +Class TaskKilled + + + +Object + + +org.apache.spark.TaskKilled + + + + + + + + +public class TaskKilled +extends Object +:: DeveloperApi :: + Task was killed intentionally and needs to be rescheduled. + + + + + + + + + + + +Constructor Summary + +Constructors + +Constructor and Description + + +TaskKilled() + + + + + + + + + +Method Summary + +Methods + +Modifier and Type +Method and Description + + +abstract static boolean +canEqual(Objectthat) + + +static boolean +countTowardsTaskFailures() + + +abstract static boolean +equals(Objectthat) + + +abstract static int +productArity() + + +abstract static Object +productElement(intn) + + +static scala.collection.IteratorObject +productIterator() + + +static String +productPrefix() + + +static String +toErrorString() + + + + + + +Methods inherited from classObject +equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait + + + + + + + + + + + + + + +Constructor Detail + + + + + +TaskKilled +publicTaskKilled() + + + + + + + + + +Method Detail + + + + + +toErrorString +public staticStringtoErrorString() + + + + + + + +countTowardsTaskFailures +public staticbooleancountTowardsTaskFailures() + + + + + + + +canEqual +public abstract staticbooleancanEqual(Objectthat) + + + + + + + +equals +public abstract staticbooleanequals(Objectthat) + + + + + + + +productElement +public abstract staticObjectproductElement(intn) + + + + + + + +productArity +public abstract staticintproductArity() + + + + + + + +productIterator +public staticscala.collection.IteratorObjectproductIterator() + + + + + + + +productPrefix +public staticStringproductPrefix() + + + + + + + + + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/TaskKilledException.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/TaskKilledException.html b/site/docs/2.0.2/api/java/org/apache/spark/TaskKilledException.html new file mode 100644 index 000..6f9a0b7 --- /dev/null +++ b/site/docs/2.0.2/api/java/org/apache/spark/TaskKilledException.html @@ -0,0 +1,255 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +TaskKilledException (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + +org.apache.spark +Class TaskKilledException + + + +Object + + +Throwable + + +Exception + + +RuntimeException + + +org.apache.spark.TaskKilledException + + + + + + + + + + + + + +All Implemented Interfaces: +java.io.Serializable + + + +public class TaskKilledException +extends RuntimeException +:: DeveloperApi :: + Exception thrown when a task is explicitly killed
[04/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/SparkJobInfo.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/SparkJobInfo.html b/site/docs/2.0.2/api/java/org/apache/spark/SparkJobInfo.html new file mode 100644 index 000..bc6e486 --- /dev/null +++ b/site/docs/2.0.2/api/java/org/apache/spark/SparkJobInfo.html @@ -0,0 +1,243 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +SparkJobInfo (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + +org.apache.spark +Interface SparkJobInfo + + + + + + +All Superinterfaces: +java.io.Serializable + + +All Known Implementing Classes: +SparkJobInfoImpl + + + +public interface SparkJobInfo +extends java.io.Serializable +Exposes information about Spark Jobs. + + This interface is not designed to be implemented outside of Spark. We may add additional methods + which may break binary compatibility with outside implementations. + + + + + + + + + + + +Method Summary + +Methods + +Modifier and Type +Method and Description + + +int +jobId() + + +int[] +stageIds() + + +JobExecutionStatus +status() + + + + + + + + + + + + + + + +Method Detail + + + + + +jobId +intjobId() + + + + + + + +stageIds +int[]stageIds() + + + + + + + +status +JobExecutionStatusstatus() + + + + + + + + + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/SparkJobInfoImpl.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/SparkJobInfoImpl.html b/site/docs/2.0.2/api/java/org/apache/spark/SparkJobInfoImpl.html new file mode 100644 index 000..d95f90b --- /dev/null +++ b/site/docs/2.0.2/api/java/org/apache/spark/SparkJobInfoImpl.html @@ -0,0 +1,302 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +SparkJobInfoImpl (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + +org.apache.spark +Class SparkJobInfoImpl + + + +Object + + +org.apache.spark.SparkJobInfoImpl + + + + + + + +All Implemented Interfaces: +java.io.Serializable, SparkJobInfo + + + +public class SparkJobInfoImpl +extends Object +implements SparkJobInfo +See Also:Serialized Form + + + + + + + + + + + +Constructor Summary + +Constructors + +Constructor and Description + + +SparkJobInfoImpl(intjobId, +int[]stageIds, +JobExecutionStatusstatus) + + + + + + + + + +Method Summary + +Methods + +Modifier and Type +Method and Description + + +int +jobId() + + +int[] +stageIds() + + +JobExecutionStatus +status() + + + + + + +Methods inherited from classObject +equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait + + + + + + + + + + + + + + +Constructor Detail + + + + + +SparkJobInfoImpl +publicSparkJobInfoImpl(intjobId, +int[]stageIds, +JobExecutionStatusstatus) + + + + + + + + + +Method Detail + + + + + +jobId +publicintjobId() + +Specified by: +jobIdin interfaceSparkJobInfo + + + + + + + + +stageIds +publicint[]stageIds() + +Specified by: +stageIdsin interfaceSparkJobInfo + + + + + + + + +status +publicJobExecutionStatusstatus() + +Specified by: +statusin
[24/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/window.html -- diff --git a/site/docs/2.0.2/api/R/window.html b/site/docs/2.0.2/api/R/window.html new file mode 100644 index 000..01536c1 --- /dev/null +++ b/site/docs/2.0.2/api/R/window.html @@ -0,0 +1,163 @@ + +R: window + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +window {SparkR}R Documentation + +window + +Description + +Bucketize rows into one or more time windows given a timestamp specifying column. Window +starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window +[12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in +the order of months are not supported. + + + +Usage + + +## S4 method for signature 'Column' +window(x, windowDuration, slideDuration = NULL, + startTime = NULL) + +window(x, ...) + + + +Arguments + + +x + +a time Column. Must be of TimestampType. + +windowDuration + +a string specifying the width of the window, e.g. '1 second', +'1 day 12 hours', '2 minutes'. Valid interval strings are 'week', +'day', 'hour', 'minute', 'second', 'millisecond', 'microsecond'. Note that +the duration is a fixed length of time, and does not vary over time +according to a calendar. For example, '1 day' always means 86,400,000 +milliseconds, not a calendar day. + +slideDuration + +a string specifying the sliding interval of the window. Same format as +windowDuration. A new window will be generated every +slideDuration. Must be less than or equal to +the windowDuration. This duration is likewise absolute, and does not +vary according to a calendar. + +startTime + +the offset with respect to 1970-01-01 00:00:00 UTC with which to start +window intervals. For example, in order to have hourly tumbling windows +that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide +startTime as "15 minutes". + +... + +further arguments to be passed to or from other methods. + + + + +Value + +An output column of struct called 'window' by default with the nested columns 'start' +and 'end'. + + + +Note + +window since 2.0.0 + + + +See Also + +Other datetime_funcs: add_months, +add_months, +add_months,Column,numeric-method; +date_add, date_add, +date_add,Column,numeric-method; +date_format, date_format, +date_format,Column,character-method; +date_sub, date_sub, +date_sub,Column,numeric-method; +datediff, datediff, +datediff,Column-method; +dayofmonth, dayofmonth, +dayofmonth,Column-method; +dayofyear, dayofyear, +dayofyear,Column-method; +from_unixtime, from_unixtime, +from_unixtime,Column-method; +from_utc_timestamp, +from_utc_timestamp, +from_utc_timestamp,Column,character-method; +hour, hour, +hour,Column-method; last_day, +last_day, +last_day,Column-method; +minute, minute, +minute,Column-method; +months_between, +months_between, +months_between,Column-method; +month, month, +month,Column-method; +next_day, next_day, +next_day,Column,character-method; +quarter, quarter, +quarter,Column-method; +second, second, +second,Column-method; +to_date, to_date, +to_date,Column-method; +to_utc_timestamp, +to_utc_timestamp, +to_utc_timestamp,Column,character-method; +unix_timestamp, +unix_timestamp, +unix_timestamp, +unix_timestamp, +unix_timestamp,Column,character-method, +unix_timestamp,Column,missing-method, +unix_timestamp,missing,missing-method; +weekofyear, weekofyear, +weekofyear,Column-method; +year, year, +year,Column-method + + + +Examples + +## Not run: +##D # One minute windows every 15 seconds 10 seconds after the minute, e.g. 09:00:10-09:01:10, +##D # 09:00:25-09:01:25, 09:00:40-09:01:40, ... +##D window(df$time, 1 minute, 15 seconds, 10 seconds) +##D +##D # One minute tumbling windows 15 seconds after the minute, e.g. 09:00:15-09:01:15, +##D# 09:01:15-09:02:15... +##D window(df$time, 1 minute, startTime = 15 seconds) +##D +##D # Thirty-second windows every 10 seconds, e.g. 09:00:00-09:00:30, 09:00:10-09:00:40, ... +##D window(df$time, 30 seconds, 10 seconds) +## End(Not run) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/windowOrderBy.html -- diff --git a/site/docs/2.0.2/api/R/windowOrderBy.html b/site/docs/2.0.2/api/R/windowOrderBy.html new file mode 100644 index 000..b0cb39e --- /dev/null +++ b/site/docs/2.0.2/api/R/windowOrderBy.html @@ -0,0 +1,72 @@ + +R: windowOrderBy + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;>
[38/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/install.spark.html -- diff --git a/site/docs/2.0.2/api/R/install.spark.html b/site/docs/2.0.2/api/R/install.spark.html new file mode 100644 index 000..5657727 --- /dev/null +++ b/site/docs/2.0.2/api/R/install.spark.html @@ -0,0 +1,119 @@ + +R: Download and Install Apache Spark to a Local Directory + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +install.spark {SparkR}R Documentation + +Download and Install Apache Spark to a Local Directory + +Description + +install.spark downloads and installs Spark to a local directory if +it is not found. The Spark version we use is the same as the SparkR version. +Users can specify a desired Hadoop version, the remote mirror site, and +the directory where the package is installed locally. + + + +Usage + + +install.spark(hadoopVersion = "2.7", mirrorUrl = NULL, localDir = NULL, + overwrite = FALSE) + + + +Arguments + + +hadoopVersion + +Version of Hadoop to install. Default is "2.7". It can take other +version number in the format of x.y where x and y are integer. +If hadoopVersion = "without", Hadoop free build is installed. +See +http://spark.apache.org/docs/latest/hadoop-provided.html;> +Hadoop Free Build for more information. +Other patched version names can also be used, e.g. "cdh4" + +mirrorUrl + +base URL of the repositories to use. The directory layout should follow +http://www.apache.org/dyn/closer.lua/spark/;>Apache mirrors. + +localDir + +a local directory where Spark is installed. The directory contains +version-specific folders of Spark packages. Default is path to +the cache directory: + + + + Mac OS X: ~/Library/Caches/spark + + + Unix: $XDG_CACHE_HOME if defined, otherwise ~/.cache/spark + + + Windows: %LOCALAPPDATA%\spark\spark\Cache. + + + +overwrite + +If TRUE, download and overwrite the existing tar file in localDir +and force re-install Spark (in case the local directory or file is corrupted) + + + + +Details + +The full url of remote file is inferred from mirrorUrl and hadoopVersion. +mirrorUrl specifies the remote path to a Spark folder. It is followed by a subfolder +named after the Spark version (that corresponds to SparkR), and then the tar filename. +The filename is composed of four parts, i.e. [Spark version]-bin-[Hadoop version].tgz. +For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 from +http://apache.osuosl.org has path: +http://apache.osuosl.org/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz. +For hadoopVersion = "without", [Hadoop version] in the filename is then +without-hadoop. + + + +Value + +install.spark returns the local directory where Spark is found or installed + + + +Note + +install.spark since 2.1.0 + + + +See Also + +See available Hadoop versions: +http://spark.apache.org/downloads.html;>Apache Spark + + + +Examples + +## Not run: +##D install.spark() +## End(Not run) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/instr.html -- diff --git a/site/docs/2.0.2/api/R/instr.html b/site/docs/2.0.2/api/R/instr.html new file mode 100644 index 000..c0483ad --- /dev/null +++ b/site/docs/2.0.2/api/R/instr.html @@ -0,0 +1,126 @@ + +R: instr + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +instr {SparkR}R Documentation + +instr + +Description + +Locate the position of the first occurrence of substr column in the given string. +Returns null if either of the arguments are null. + + + +Usage + + +## S4 method for signature 'Column,character' +instr(y, x) + +instr(y, x) + + + +Arguments + + +y + +column to check + +x + +substring to check + + + + +Details + +NOTE: The position is not zero based, but 1 based index, returns 0 if substr +could not be found in str. + + + +Note + +instr since 1.5.0 + + + +See Also + +Other string_funcs: ascii, +ascii, ascii,Column-method; +base64, base64, +base64,Column-method; +concat_ws, concat_ws, +concat_ws,character,Column-method; +concat, concat, +concat,Column-method; decode, +decode, +decode,Column,character-method; +encode, encode, +encode,Column,character-method; +format_number, format_number, +format_number,Column,numeric-method; +format_string, format_string, +format_string,character,Column-method; +initcap, initcap, +initcap,Column-method; +length,
[09/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/JobExecutionStatus.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/JobExecutionStatus.html b/site/docs/2.0.2/api/java/org/apache/spark/JobExecutionStatus.html new file mode 100644 index 000..cb8b512 --- /dev/null +++ b/site/docs/2.0.2/api/java/org/apache/spark/JobExecutionStatus.html @@ -0,0 +1,354 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +JobExecutionStatus (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Enum Constants| +Field| +Method + + +Detail: +Enum Constants| +Field| +Method + + + + + + + + +org.apache.spark +Enum JobExecutionStatus + + + +Object + + +EnumJobExecutionStatus + + +org.apache.spark.JobExecutionStatus + + + + + + + + + +All Implemented Interfaces: +java.io.Serializable, ComparableJobExecutionStatus + + + +public enum JobExecutionStatus +extends EnumJobExecutionStatus + + + + + + + + + + + +Enum Constant Summary + +Enum Constants + +Enum Constant and Description + + +FAILED + + +RUNNING + + +SUCCEEDED + + +UNKNOWN + + + + + + + + + +Method Summary + +Methods + +Modifier and Type +Method and Description + + +static JobExecutionStatus +fromString(Stringstr) + + +static JobExecutionStatus +valueOf(Stringname) +Returns the enum constant of this type with the specified name. + + + +static JobExecutionStatus[] +values() +Returns an array containing the constants of this enum type, in +the order they are declared. + + + + + + + +Methods inherited from classEnum +compareTo, equals, getDeclaringClass, hashCode, name, ordinal, toString, valueOf + + + + + +Methods inherited from classObject +getClass, notify, notifyAll, wait, wait, wait + + + + + + + + + + + + + + +Enum Constant Detail + + + + + +RUNNING +public static finalJobExecutionStatus RUNNING + + + + + + + +SUCCEEDED +public static finalJobExecutionStatus SUCCEEDED + + + + + + + +FAILED +public static finalJobExecutionStatus FAILED + + + + + + + +UNKNOWN +public static finalJobExecutionStatus UNKNOWN + + + + + + + + + +Method Detail + + + + + +values +public staticJobExecutionStatus[]values() +Returns an array containing the constants of this enum type, in +the order they are declared. This method may be used to iterate +over the constants as follows: + +for (JobExecutionStatus c : JobExecutionStatus.values()) + System.out.println(c); + +Returns:an array containing the constants of this enum type, in the order they are declared + + + + + + + +valueOf +public staticJobExecutionStatusvalueOf(Stringname) +Returns the enum constant of this type with the specified name. +The string must match exactly an identifier used to declare an +enum constant in this type. (Extraneous whitespace characters are +not permitted.) +Parameters:name - the name of the enum constant to be returned. +Returns:the enum constant with the specified name +Throws: +IllegalArgumentException - if this enum type has no constant with the specified name +NullPointerException - if the argument is null + + + + + + + +fromString +public staticJobExecutionStatusfromString(Stringstr) + + + + + + + + + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Enum Constants| +Field| +Method + + +Detail: +Enum Constants| +Field| +Method + + + + + + + + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/JobSubmitter.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/JobSubmitter.html b/site/docs/2.0.2/api/java/org/apache/spark/JobSubmitter.html new file mode 100644 index 000..5681e58 --- /dev/null +++ b/site/docs/2.0.2/api/java/org/apache/spark/JobSubmitter.html @@ -0,0 +1,221 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +JobSubmitter (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript
[19/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/constant-values.html -- diff --git a/site/docs/2.0.2/api/java/constant-values.html b/site/docs/2.0.2/api/java/constant-values.html new file mode 100644 index 000..ee81714 --- /dev/null +++ b/site/docs/2.0.2/api/java/constant-values.html @@ -0,0 +1,233 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +Constant Field Values (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev +Next + + +Frames +No Frames + + +All Classes + + + + + + + + + + +Constant Field Values +Contents + +org.apache.* + + + + + +org.apache.* + + + +org.apache.spark.launcher.SparkLauncher + +Modifier and Type +Constant Field +Value + + + + + +publicstaticfinalString +CHILD_CONNECTION_TIMEOUT +"spark.launcher.childConectionTimeout" + + + + +publicstaticfinalString +CHILD_PROCESS_LOGGER_NAME +"spark.launcher.childProcLoggerName" + + + + +publicstaticfinalString +DEPLOY_MODE +"spark.submit.deployMode" + + + + +publicstaticfinalString +DRIVER_EXTRA_CLASSPATH +"spark.driver.extraClassPath" + + + + +publicstaticfinalString +DRIVER_EXTRA_JAVA_OPTIONS +"spark.driver.extraJavaOptions" + + + + +publicstaticfinalString +DRIVER_EXTRA_LIBRARY_PATH +"spark.driver.extraLibraryPath" + + + + +publicstaticfinalString +DRIVER_MEMORY +"spark.driver.memory" + + + + +publicstaticfinalString +EXECUTOR_CORES +"spark.executor.cores" + + + + +publicstaticfinalString +EXECUTOR_EXTRA_CLASSPATH +"spark.executor.extraClassPath" + + + + +publicstaticfinalString +EXECUTOR_EXTRA_JAVA_OPTIONS +"spark.executor.extraJavaOptions" + + + + +publicstaticfinalString +EXECUTOR_EXTRA_LIBRARY_PATH +"spark.executor.extraLibraryPath" + + + + +publicstaticfinalString +EXECUTOR_MEMORY +"spark.executor.memory" + + + + +publicstaticfinalString +NO_RESOURCE +"spark-internal" + + + + +publicstaticfinalString +SPARK_MASTER +"spark.master" + + + + + + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev +Next + + +Frames +No Frames + + +All Classes + + + + + + + + + + + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/deprecated-list.html -- diff --git a/site/docs/2.0.2/api/java/deprecated-list.html b/site/docs/2.0.2/api/java/deprecated-list.html new file mode 100644 index 000..3f4dacc --- /dev/null +++ b/site/docs/2.0.2/api/java/deprecated-list.html @@ -0,0 +1,577 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +Deprecated List (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev +Next + + +Frames +No Frames + + +All Classes + + + + + + + + + + +Deprecated API +Contents + +Deprecated Interfaces +Deprecated Classes +Deprecated Methods +Deprecated Constructors + + + + + + + + +Deprecated Interfaces + +Interface and Description + + + +org.apache.spark.AccumulableParam +use AccumulatorV2. Since 2.0.0. + + + +org.apache.spark.AccumulatorParam +use AccumulatorV2. Since 2.0.0. + + + + + + + + + + + + +Deprecated Classes + +Class and Description + + + +org.apache.spark.Accumulable +use AccumulatorV2. Since 2.0.0. + + + +org.apache.spark.Accumulator +use AccumulatorV2. Since 2.0.0. + + + +org.apache.spark.AccumulatorParam.DoubleAccumulatorParam$ +use AccumulatorV2. Since 2.0.0. + + + +org.apache.spark.AccumulatorParam.FloatAccumulatorParam$ +use AccumulatorV2. Since 2.0.0. + + + +org.apache.spark.AccumulatorParam.IntAccumulatorParam$ +use AccumulatorV2. Since 2.0.0. + + + +org.apache.spark.AccumulatorParam.LongAccumulatorParam$ +use AccumulatorV2. Since 2.0.0. + + + +org.apache.spark.AccumulatorParam.StringAccumulatorParam$ +use AccumulatorV2. Since 2.0.0. + + + +org.apache.spark.sql.hive.HiveContext +Use
[12/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/java/org/apache/spark/ComplexFutureAction.html -- diff --git a/site/docs/2.0.2/api/java/org/apache/spark/ComplexFutureAction.html b/site/docs/2.0.2/api/java/org/apache/spark/ComplexFutureAction.html new file mode 100644 index 000..598387d --- /dev/null +++ b/site/docs/2.0.2/api/java/org/apache/spark/ComplexFutureAction.html @@ -0,0 +1,485 @@ +http://www.w3.org/TR/html4/loose.dtd;> + + + + +ComplexFutureAction (Spark 2.0.2 JavaDoc) + + + + + + + +JavaScript is disabled on your browser. + + + + + + + + +Overview +Package +Class +Tree +Deprecated +Index +Help + + + + +Prev Class +Next Class + + +Frames +No Frames + + +All Classes + + + + + + + +Summary: +Nested| +Field| +Constr| +Method + + +Detail: +Field| +Constr| +Method + + + + + + + + +org.apache.spark +Class ComplexFutureActionT + + + +Object + + +org.apache.spark.ComplexFutureActionT + + + + + + + +All Implemented Interfaces: +FutureActionT, scala.concurrent.AwaitableT, scala.concurrent.FutureT + + + +public class ComplexFutureActionT +extends Object +implements FutureActionT +A FutureAction for actions that could trigger multiple Spark jobs. Examples include take, + takeSample. Cancellation works by setting the cancelled flag to true and cancelling any pending + jobs. + + + + + + + + + + + +Nested Class Summary + + + + +Nested classes/interfaces inherited from interfacescala.concurrent.Future +scala.concurrent.Future.InternalCallbackExecutor$ + + + + + + + + +Constructor Summary + +Constructors + +Constructor and Description + + +ComplexFutureAction(scala.Function1JobSubmitter,scala.concurrent.FutureTrun) + + + + + + + + + +Method Summary + +Methods + +Modifier and Type +Method and Description + + +void +cancel() +Cancels the execution of this action. + + + +boolean +isCancelled() +Returns whether the action has been cancelled. + + + +boolean +isCompleted() +Returns whether the action has already been completed with a value or an exception. + + + +scala.collection.SeqObject +jobIds() +Returns the job IDs run by the underlying async operation. + + + +Uvoid +onComplete(scala.Function1scala.util.TryT,Ufunc, + scala.concurrent.ExecutionContextexecutor) +When this action is completed, either through an exception, or a value, applies the provided + function. + + + +ComplexFutureActionT +ready(scala.concurrent.duration.DurationatMost, + scala.concurrent.CanAwaitpermit) +Blocks until this action completes. + + + +T +result(scala.concurrent.duration.DurationatMost, + scala.concurrent.CanAwaitpermit) +Awaits and returns the result (of type T) of this action. + + + +scala.Optionscala.util.TryT +value() +The value of this Future. + + + + + + + +Methods inherited from classObject +equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait + + + + + +Methods inherited from interfaceorg.apache.spark.FutureAction +get + + + + + +Methods inherited from interfacescala.concurrent.Future +andThen, collect, failed, fallbackTo, filter, flatMap, foreach, map, mapTo, onFailure, onSuccess, recover, recoverWith, transform, withFilter, zip + + + + + + + + + + + + + + +Constructor Detail + + + + + +ComplexFutureAction +publicComplexFutureAction(scala.Function1JobSubmitter,scala.concurrent.FutureTrun) + + + + + + + + + +Method Detail + + + + + +cancel +publicvoidcancel() +Description copied from interface:FutureAction +Cancels the execution of this action. + +Specified by: +cancelin interfaceFutureActionT + + + + + + + + +isCancelled +publicbooleanisCancelled() +Description copied from interface:FutureAction +Returns whether the action has been cancelled. + +Specified by: +isCancelledin interfaceFutureActionT +Returns:(undocumented) + + + + + + + +ready +publicComplexFutureActionTready(scala.concurrent.duration.DurationatMost, + scala.concurrent.CanAwaitpermit) + throws InterruptedException, +java.util.concurrent.TimeoutException +Description copied from interface:FutureAction +Blocks until this action completes. + + +Specified by: +readyin interfaceFutureActionT +Specified by: +readyin interfacescala.concurrent.AwaitableT +Parameters:atMost - maximum wait time, which may be negative (no waiting is done), Duration.Inf + for unbounded waiting, or a finite positive durationpermit - (undocumented) +Returns:this FutureAction +Throws: +InterruptedException +java.util.concurrent.TimeoutException + + + + + + +
[30/51] [partial] spark-website git commit: Add docs for 2.0.2.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/sd.html -- diff --git a/site/docs/2.0.2/api/R/sd.html b/site/docs/2.0.2/api/R/sd.html new file mode 100644 index 000..38f67ec --- /dev/null +++ b/site/docs/2.0.2/api/R/sd.html @@ -0,0 +1,121 @@ + +R: sd + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +sd {SparkR}R Documentation + +sd + +Description + +Aggregate function: alias for stddev_samp + + + +Usage + + +## S4 method for signature 'Column' +sd(x) + +## S4 method for signature 'Column' +stddev(x) + +sd(x, na.rm = FALSE) + +stddev(x) + + + +Arguments + + +x + +Column to compute on. + +na.rm + +currently not used. + + + + +Note + +sd since 1.6.0 + +stddev since 1.6.0 + + + +See Also + +stddev_pop, stddev_samp + +Other agg_funcs: agg, agg, +agg, agg,GroupedData-method, +agg,SparkDataFrame-method, +summarize, summarize, +summarize, +summarize,GroupedData-method, +summarize,SparkDataFrame-method; +avg, avg, +avg,Column-method; +countDistinct, countDistinct, +countDistinct,Column-method, +n_distinct, n_distinct, +n_distinct,Column-method; +count, count, +count,Column-method, +count,GroupedData-method, n, +n, n,Column-method; +first, first, +first, +first,SparkDataFrame-method, +first,characterOrColumn-method; +kurtosis, kurtosis, +kurtosis,Column-method; last, +last, +last,characterOrColumn-method; +max, max,Column-method; +mean, mean,Column-method; +min, min,Column-method; +skewness, skewness, +skewness,Column-method; +stddev_pop, stddev_pop, +stddev_pop,Column-method; +stddev_samp, stddev_samp, +stddev_samp,Column-method; +sumDistinct, sumDistinct, +sumDistinct,Column-method; +sum, sum,Column-method; +var_pop, var_pop, +var_pop,Column-method; +var_samp, var_samp, +var_samp,Column-method; var, +var, var,Column-method, +variance, variance, +variance,Column-method + + + +Examples + +## Not run: +##D stddev(df$c) +##D select(df, stddev(df$age)) +##D agg(df, sd(df$age)) +## End(Not run) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/second.html -- diff --git a/site/docs/2.0.2/api/R/second.html b/site/docs/2.0.2/api/R/second.html new file mode 100644 index 000..92dc854 --- /dev/null +++ b/site/docs/2.0.2/api/R/second.html @@ -0,0 +1,112 @@ + +R: second + + + +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css;> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"> +https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"> +hljs.initHighlightingOnLoad(); + + +second {SparkR}R Documentation + +second + +Description + +Extracts the seconds as an integer from a given date/timestamp/string. + + + +Usage + + +## S4 method for signature 'Column' +second(x) + +second(x) + + + +Arguments + + +x + +Column to compute on. + + + + +Note + +second since 1.5.0 + + + +See Also + +Other datetime_funcs: add_months, +add_months, +add_months,Column,numeric-method; +date_add, date_add, +date_add,Column,numeric-method; +date_format, date_format, +date_format,Column,character-method; +date_sub, date_sub, +date_sub,Column,numeric-method; +datediff, datediff, +datediff,Column-method; +dayofmonth, dayofmonth, +dayofmonth,Column-method; +dayofyear, dayofyear, +dayofyear,Column-method; +from_unixtime, from_unixtime, +from_unixtime,Column-method; +from_utc_timestamp, +from_utc_timestamp, +from_utc_timestamp,Column,character-method; +hour, hour, +hour,Column-method; last_day, +last_day, +last_day,Column-method; +minute, minute, +minute,Column-method; +months_between, +months_between, +months_between,Column-method; +month, month, +month,Column-method; +next_day, next_day, +next_day,Column,character-method; +quarter, quarter, +quarter,Column-method; +to_date, to_date, +to_date,Column-method; +to_utc_timestamp, +to_utc_timestamp, +to_utc_timestamp,Column,character-method; +unix_timestamp, +unix_timestamp, +unix_timestamp, +unix_timestamp, +unix_timestamp,Column,character-method, +unix_timestamp,Column,missing-method, +unix_timestamp,missing,missing-method; +weekofyear, weekofyear, +weekofyear,Column-method; +window, window, +window,Column-method; year, +year, year,Column-method + + + +Examples + +## Not run: second(df$c) + + + +[Package SparkR version 2.0.2 Index] + http://git-wip-us.apache.org/repos/asf/spark-website/blob/0bd36316/site/docs/2.0.2/api/R/select.html -- diff --git a/site/docs/2.0.2/api/R/select.html b/site/docs/2.0.2/api/R/select.html new file mode 100644 index
spark git commit: [SPARK-18387][SQL] Add serialization to checkEvaluation.
Repository: spark Updated Branches: refs/heads/branch-2.1 465e4b40b -> 87820da78 [SPARK-18387][SQL] Add serialization to checkEvaluation. ## What changes were proposed in this pull request? This removes the serialization test from RegexpExpressionsSuite and replaces it by serializing all expressions in checkEvaluation. This also fixes math constant expressions by making LeafMathExpression Serializable and fixes NumberFormat values that are null or invalid after serialization. ## How was this patch tested? This patch is to tests. Author: Ryan BlueCloses #15847 from rdblue/SPARK-18387-fix-serializable-expressions. (cherry picked from commit 6e95325fc3726d260054bd6e7c0717b3c139917e) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/87820da7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/87820da7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/87820da7 Branch: refs/heads/branch-2.1 Commit: 87820da782fd2d08078227a2ce5c363c3e1cb0f0 Parents: 465e4b4 Author: Ryan Blue Authored: Fri Nov 11 13:52:10 2016 -0800 Committer: Reynold Xin Committed: Fri Nov 11 13:52:18 2016 -0800 -- .../catalyst/expressions/mathExpressions.scala | 2 +- .../expressions/stringExpressions.scala | 44 +++- .../expressions/ExpressionEvalHelper.scala | 15 --- .../expressions/RegexpExpressionsSuite.scala| 16 +-- 4 files changed, 36 insertions(+), 41 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/87820da7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala index a60494a..65273a7 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala @@ -36,7 +36,7 @@ import org.apache.spark.unsafe.types.UTF8String * @param name The short name of the function */ abstract class LeafMathExpression(c: Double, name: String) - extends LeafExpression with CodegenFallback { + extends LeafExpression with CodegenFallback with Serializable { override def dataType: DataType = DoubleType override def foldable: Boolean = true http://git-wip-us.apache.org/repos/asf/spark/blob/87820da7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala index 5f533fe..e74ef9a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala @@ -1431,18 +1431,20 @@ case class FormatNumber(x: Expression, d: Expression) // Associated with the pattern, for the last d value, and we will update the // pattern (DecimalFormat) once the new coming d value differ with the last one. + // This is an Option to distinguish between 0 (numberFormat is valid) and uninitialized after + // serialization (numberFormat has not been updated for dValue = 0). @transient - private var lastDValue: Int = -100 + private var lastDValue: Option[Int] = None // A cached DecimalFormat, for performance concern, we will change it // only if the d value changed. @transient - private val pattern: StringBuffer = new StringBuffer() + private lazy val pattern: StringBuffer = new StringBuffer() // SPARK-13515: US Locale configures the DecimalFormat object to use a dot ('.') // as a decimal separator. @transient - private val numberFormat = new DecimalFormat("", new DecimalFormatSymbols(Locale.US)) + private lazy val numberFormat = new DecimalFormat("", new DecimalFormatSymbols(Locale.US)) override protected def nullSafeEval(xObject: Any, dObject: Any): Any = { val dValue = dObject.asInstanceOf[Int] @@ -1450,24 +1452,28 @@ case class FormatNumber(x: Expression, d: Expression) return null } -if (dValue != lastDValue) { - // construct a new DecimalFormat only if a new dValue - pattern.delete(0, pattern.length) - pattern.append("#,###,###,###,###,###,##0") - - // decimal place - if (dValue > 0) { -
spark git commit: [SPARK-18387][SQL] Add serialization to checkEvaluation.
Repository: spark Updated Branches: refs/heads/branch-2.0 6e7310590 -> 99575e88f [SPARK-18387][SQL] Add serialization to checkEvaluation. ## What changes were proposed in this pull request? This removes the serialization test from RegexpExpressionsSuite and replaces it by serializing all expressions in checkEvaluation. This also fixes math constant expressions by making LeafMathExpression Serializable and fixes NumberFormat values that are null or invalid after serialization. ## How was this patch tested? This patch is to tests. Author: Ryan BlueCloses #15847 from rdblue/SPARK-18387-fix-serializable-expressions. (cherry picked from commit 6e95325fc3726d260054bd6e7c0717b3c139917e) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/99575e88 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/99575e88 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/99575e88 Branch: refs/heads/branch-2.0 Commit: 99575e88fd711c3fc25e8e6f00bbc8d1491feed6 Parents: 6e73105 Author: Ryan Blue Authored: Fri Nov 11 13:52:10 2016 -0800 Committer: Reynold Xin Committed: Fri Nov 11 13:52:28 2016 -0800 -- .../catalyst/expressions/mathExpressions.scala | 2 +- .../expressions/stringExpressions.scala | 44 +++- .../expressions/ExpressionEvalHelper.scala | 15 --- .../expressions/RegexpExpressionsSuite.scala| 16 +-- 4 files changed, 36 insertions(+), 41 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/99575e88/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala index 5152265..591e1e5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala @@ -36,7 +36,7 @@ import org.apache.spark.unsafe.types.UTF8String * @param name The short name of the function */ abstract class LeafMathExpression(c: Double, name: String) - extends LeafExpression with CodegenFallback { + extends LeafExpression with CodegenFallback with Serializable { override def dataType: DataType = DoubleType override def foldable: Boolean = true http://git-wip-us.apache.org/repos/asf/spark/blob/99575e88/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala index 61549c9..004c74d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala @@ -1236,18 +1236,20 @@ case class FormatNumber(x: Expression, d: Expression) // Associated with the pattern, for the last d value, and we will update the // pattern (DecimalFormat) once the new coming d value differ with the last one. + // This is an Option to distinguish between 0 (numberFormat is valid) and uninitialized after + // serialization (numberFormat has not been updated for dValue = 0). @transient - private var lastDValue: Int = -100 + private var lastDValue: Option[Int] = None // A cached DecimalFormat, for performance concern, we will change it // only if the d value changed. @transient - private val pattern: StringBuffer = new StringBuffer() + private lazy val pattern: StringBuffer = new StringBuffer() // SPARK-13515: US Locale configures the DecimalFormat object to use a dot ('.') // as a decimal separator. @transient - private val numberFormat = new DecimalFormat("", new DecimalFormatSymbols(Locale.US)) + private lazy val numberFormat = new DecimalFormat("", new DecimalFormatSymbols(Locale.US)) override protected def nullSafeEval(xObject: Any, dObject: Any): Any = { val dValue = dObject.asInstanceOf[Int] @@ -1255,24 +1257,28 @@ case class FormatNumber(x: Expression, d: Expression) return null } -if (dValue != lastDValue) { - // construct a new DecimalFormat only if a new dValue - pattern.delete(0, pattern.length) - pattern.append("#,###,###,###,###,###,##0") - - // decimal place - if (dValue > 0) { -
spark git commit: [SPARK-18387][SQL] Add serialization to checkEvaluation.
Repository: spark Updated Branches: refs/heads/master d42bb7cc4 -> 6e95325fc [SPARK-18387][SQL] Add serialization to checkEvaluation. ## What changes were proposed in this pull request? This removes the serialization test from RegexpExpressionsSuite and replaces it by serializing all expressions in checkEvaluation. This also fixes math constant expressions by making LeafMathExpression Serializable and fixes NumberFormat values that are null or invalid after serialization. ## How was this patch tested? This patch is to tests. Author: Ryan BlueCloses #15847 from rdblue/SPARK-18387-fix-serializable-expressions. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6e95325f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6e95325f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6e95325f Branch: refs/heads/master Commit: 6e95325fc3726d260054bd6e7c0717b3c139917e Parents: d42bb7c Author: Ryan Blue Authored: Fri Nov 11 13:52:10 2016 -0800 Committer: Reynold Xin Committed: Fri Nov 11 13:52:10 2016 -0800 -- .../catalyst/expressions/mathExpressions.scala | 2 +- .../expressions/stringExpressions.scala | 44 +++- .../expressions/ExpressionEvalHelper.scala | 15 --- .../expressions/RegexpExpressionsSuite.scala| 16 +-- 4 files changed, 36 insertions(+), 41 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6e95325f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala index a60494a..65273a7 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala @@ -36,7 +36,7 @@ import org.apache.spark.unsafe.types.UTF8String * @param name The short name of the function */ abstract class LeafMathExpression(c: Double, name: String) - extends LeafExpression with CodegenFallback { + extends LeafExpression with CodegenFallback with Serializable { override def dataType: DataType = DoubleType override def foldable: Boolean = true http://git-wip-us.apache.org/repos/asf/spark/blob/6e95325f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala index 5f533fe..e74ef9a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala @@ -1431,18 +1431,20 @@ case class FormatNumber(x: Expression, d: Expression) // Associated with the pattern, for the last d value, and we will update the // pattern (DecimalFormat) once the new coming d value differ with the last one. + // This is an Option to distinguish between 0 (numberFormat is valid) and uninitialized after + // serialization (numberFormat has not been updated for dValue = 0). @transient - private var lastDValue: Int = -100 + private var lastDValue: Option[Int] = None // A cached DecimalFormat, for performance concern, we will change it // only if the d value changed. @transient - private val pattern: StringBuffer = new StringBuffer() + private lazy val pattern: StringBuffer = new StringBuffer() // SPARK-13515: US Locale configures the DecimalFormat object to use a dot ('.') // as a decimal separator. @transient - private val numberFormat = new DecimalFormat("", new DecimalFormatSymbols(Locale.US)) + private lazy val numberFormat = new DecimalFormat("", new DecimalFormatSymbols(Locale.US)) override protected def nullSafeEval(xObject: Any, dObject: Any): Any = { val dValue = dObject.asInstanceOf[Int] @@ -1450,24 +1452,28 @@ case class FormatNumber(x: Expression, d: Expression) return null } -if (dValue != lastDValue) { - // construct a new DecimalFormat only if a new dValue - pattern.delete(0, pattern.length) - pattern.append("#,###,###,###,###,###,##0") - - // decimal place - if (dValue > 0) { -pattern.append(".") - -var i = 0 -while (i < dValue) { - i += 1 - pattern.append("0") +
[spark] Git Push Summary
Repository: spark Updated Tags: refs/tags/v2.0.2-rc2 [deleted] a6abe1ee2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] Git Push Summary
Repository: spark Updated Tags: refs/tags/v2.0.2-rc3 [deleted] 584354eaa - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] Git Push Summary
Repository: spark Updated Tags: refs/tags/v2.0.2 [created] 584354eaa - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE for Datasource tables
Repository: spark Updated Branches: refs/heads/branch-2.1 c602894f2 -> 064d4315f [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE for Datasource tables ## What changes were proposed in this pull request? As of current 2.1, INSERT OVERWRITE with dynamic partitions against a Datasource table will overwrite the entire table instead of only the partitions matching the static keys, as in Hive. It also doesn't respect custom partition locations. This PR adds support for all these operations to Datasource tables managed by the Hive metastore. It is implemented as follows - During planning time, the full set of partitions affected by an INSERT or OVERWRITE command is read from the Hive metastore. - The planner identifies any partitions with custom locations and includes this in the write task metadata. - FileFormatWriter tasks refer to this custom locations map when determining where to write for dynamic partition output. - When the write job finishes, the set of written partitions is compared against the initial set of matched partitions, and the Hive metastore is updated to reflect the newly added / removed partitions. It was necessary to introduce a method for staging files with absolute output paths to `FileCommitProtocol`. These files are not handled by the Hadoop output committer but are moved to their final locations when the job commits. The overwrite behavior of legacy Datasource tables is also changed: no longer will the entire table be overwritten if a partial partition spec is present. cc cloud-fan yhuai ## How was this patch tested? Unit tests, existing tests. Author: Eric LiangAuthor: Wenchen Fan Closes #15814 from ericl/sc-5027. (cherry picked from commit a3356343cbf58b930326f45721fb4ecade6f8029) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/064d4315 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/064d4315 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/064d4315 Branch: refs/heads/branch-2.1 Commit: 064d4315f246450043a52882fcf59e95d79701e8 Parents: c602894 Author: Eric Liang Authored: Thu Nov 10 17:00:43 2016 -0800 Committer: Reynold Xin Committed: Thu Nov 10 17:01:08 2016 -0800 -- .../spark/internal/io/FileCommitProtocol.scala | 15 ++ .../io/HadoopMapReduceCommitProtocol.scala | 63 +++- .../spark/sql/catalyst/parser/AstBuilder.scala | 12 +- .../plans/logical/basicLogicalOperators.scala | 10 +- .../sql/catalyst/parser/PlanParserSuite.scala | 4 +- .../sql/execution/datasources/DataSource.scala | 20 +-- .../datasources/DataSourceStrategy.scala| 94 +++ .../datasources/FileFormatWriter.scala | 26 ++- .../InsertIntoHadoopFsRelationCommand.scala | 61 ++- .../datasources/PartitioningUtils.scala | 10 ++ .../execution/streaming/FileStreamSink.scala| 2 +- .../streaming/ManifestFileCommitProtocol.scala | 6 + .../PartitionProviderCompatibilitySuite.scala | 161 ++- 13 files changed, 411 insertions(+), 73 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/064d4315/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala -- diff --git a/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala b/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala index fb80205..afd2250 100644 --- a/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala +++ b/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala @@ -82,10 +82,25 @@ abstract class FileCommitProtocol { * * The "dir" parameter specifies 2, and "ext" parameter specifies both 4 and 5, and the rest * are left to the commit protocol implementation to decide. + * + * Important: it is the caller's responsibility to add uniquely identifying content to "ext" + * if a task is going to write out multiple files to the same dir. The file commit protocol only + * guarantees that files written by different tasks will not conflict. */ def newTaskTempFile(taskContext: TaskAttemptContext, dir: Option[String], ext: String): String /** + * Similar to newTaskTempFile(), but allows files to committed to an absolute output location. + * Depending on the implementation, there may be weaker guarantees around adding files this way. + * + * Important: it is the caller's responsibility to add uniquely identifying content to "ext" + * if a task is going to write out multiple files to the same dir. The file commit protocol only + * guarantees that files written by
spark git commit: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE for Datasource tables
Repository: spark Updated Branches: refs/heads/master e0deee1f7 -> a3356343c [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE for Datasource tables ## What changes were proposed in this pull request? As of current 2.1, INSERT OVERWRITE with dynamic partitions against a Datasource table will overwrite the entire table instead of only the partitions matching the static keys, as in Hive. It also doesn't respect custom partition locations. This PR adds support for all these operations to Datasource tables managed by the Hive metastore. It is implemented as follows - During planning time, the full set of partitions affected by an INSERT or OVERWRITE command is read from the Hive metastore. - The planner identifies any partitions with custom locations and includes this in the write task metadata. - FileFormatWriter tasks refer to this custom locations map when determining where to write for dynamic partition output. - When the write job finishes, the set of written partitions is compared against the initial set of matched partitions, and the Hive metastore is updated to reflect the newly added / removed partitions. It was necessary to introduce a method for staging files with absolute output paths to `FileCommitProtocol`. These files are not handled by the Hadoop output committer but are moved to their final locations when the job commits. The overwrite behavior of legacy Datasource tables is also changed: no longer will the entire table be overwritten if a partial partition spec is present. cc cloud-fan yhuai ## How was this patch tested? Unit tests, existing tests. Author: Eric LiangAuthor: Wenchen Fan Closes #15814 from ericl/sc-5027. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a3356343 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a3356343 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a3356343 Branch: refs/heads/master Commit: a3356343cbf58b930326f45721fb4ecade6f8029 Parents: e0deee1 Author: Eric Liang Authored: Thu Nov 10 17:00:43 2016 -0800 Committer: Reynold Xin Committed: Thu Nov 10 17:00:43 2016 -0800 -- .../spark/internal/io/FileCommitProtocol.scala | 15 ++ .../io/HadoopMapReduceCommitProtocol.scala | 63 +++- .../spark/sql/catalyst/parser/AstBuilder.scala | 12 +- .../plans/logical/basicLogicalOperators.scala | 10 +- .../sql/catalyst/parser/PlanParserSuite.scala | 4 +- .../sql/execution/datasources/DataSource.scala | 20 +-- .../datasources/DataSourceStrategy.scala| 94 +++ .../datasources/FileFormatWriter.scala | 26 ++- .../InsertIntoHadoopFsRelationCommand.scala | 61 ++- .../datasources/PartitioningUtils.scala | 10 ++ .../execution/streaming/FileStreamSink.scala| 2 +- .../streaming/ManifestFileCommitProtocol.scala | 6 + .../PartitionProviderCompatibilitySuite.scala | 161 ++- 13 files changed, 411 insertions(+), 73 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a3356343/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala -- diff --git a/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala b/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala index fb80205..afd2250 100644 --- a/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala +++ b/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala @@ -82,10 +82,25 @@ abstract class FileCommitProtocol { * * The "dir" parameter specifies 2, and "ext" parameter specifies both 4 and 5, and the rest * are left to the commit protocol implementation to decide. + * + * Important: it is the caller's responsibility to add uniquely identifying content to "ext" + * if a task is going to write out multiple files to the same dir. The file commit protocol only + * guarantees that files written by different tasks will not conflict. */ def newTaskTempFile(taskContext: TaskAttemptContext, dir: Option[String], ext: String): String /** + * Similar to newTaskTempFile(), but allows files to committed to an absolute output location. + * Depending on the implementation, there may be weaker guarantees around adding files this way. + * + * Important: it is the caller's responsibility to add uniquely identifying content to "ext" + * if a task is going to write out multiple files to the same dir. The file commit protocol only + * guarantees that files written by different tasks will not conflict. + */ + def newTaskTempFileAbsPath( + taskContext: TaskAttemptContext, absoluteDir:
spark git commit: [SPARK-18403][SQL] Temporarily disable flaky ObjectHashAggregateSuite
Repository: spark Updated Branches: refs/heads/master 2f7461f31 -> e0deee1f7 [SPARK-18403][SQL] Temporarily disable flaky ObjectHashAggregateSuite ## What changes were proposed in this pull request? Randomized tests in `ObjectHashAggregateSuite` is being flaky and breaks PR builds. This PR disables them temporarily to bring back the PR build. ## How was this patch tested? N/A Author: Cheng LianCloses #15845 from liancheng/ignore-flaky-object-hash-agg-suite. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e0deee1f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e0deee1f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e0deee1f Branch: refs/heads/master Commit: e0deee1f7df31177cfc14bbb296f0baa372f473d Parents: 2f7461f Author: Cheng Lian Authored: Thu Nov 10 13:44:54 2016 -0800 Committer: Reynold Xin Committed: Thu Nov 10 13:44:54 2016 -0800 -- .../spark/sql/hive/execution/ObjectHashAggregateSuite.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e0deee1f/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala -- diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala index 93fc5e8..b7f91d8 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala @@ -326,7 +326,8 @@ class ObjectHashAggregateSuite // Currently Spark SQL doesn't support evaluating distinct aggregate function together // with aggregate functions without partial aggregation support. if (!(aggs.contains(withoutPartial) && aggs.contains(withDistinct))) { - test( + // TODO Re-enables them after fixing SPARK-18403 + ignore( s"randomized aggregation test - " + s"${names.mkString("[", ", ", "]")} - " + s"${if (withGroupingKeys) "with" else "without"} grouping keys - " + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17990][SPARK-18302][SQL] correct several partition related behaviours of ExternalCatalog
Repository: spark Updated Branches: refs/heads/branch-2.1 be3933ddf -> c602894f2 [SPARK-17990][SPARK-18302][SQL] correct several partition related behaviours of ExternalCatalog ## What changes were proposed in this pull request? This PR corrects several partition related behaviors of `ExternalCatalog`: 1. default partition location should not always lower case the partition column names in path string(fix `HiveExternalCatalog`) 2. rename partition should not always lower case the partition column names in updated partition path string(fix `HiveExternalCatalog`) 3. rename partition should update the partition location only for managed table(fix `InMemoryCatalog`) 4. create partition with existing directory should be fine(fix `InMemoryCatalog`) 5. create partition with non-existing directory should create that directory(fix `InMemoryCatalog`) 6. drop partition from external table should not delete the directory(fix `InMemoryCatalog`) ## How was this patch tested? new tests in `ExternalCatalogSuite` Author: Wenchen FanCloses #15797 from cloud-fan/partition. (cherry picked from commit 2f7461f31331cfc37f6cfa3586b7bbefb3af5547) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c602894f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c602894f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c602894f Branch: refs/heads/branch-2.1 Commit: c602894f25bf9e61b759815674008471858cc71e Parents: be3933d Author: Wenchen Fan Authored: Thu Nov 10 13:42:48 2016 -0800 Committer: Reynold Xin Committed: Thu Nov 10 13:42:54 2016 -0800 -- .../catalyst/catalog/ExternalCatalogUtils.scala | 121 +++ .../sql/catalyst/catalog/InMemoryCatalog.scala | 92 ++-- .../spark/sql/catalyst/catalog/interface.scala | 11 ++ .../catalyst/catalog/ExternalCatalogSuite.scala | 150 +++ .../catalyst/catalog/SessionCatalogSuite.scala | 24 ++- .../spark/sql/execution/command/ddl.scala | 8 +- .../spark/sql/execution/command/tables.scala| 3 +- .../datasources/CatalogFileIndex.scala | 2 +- .../datasources/DataSourceStrategy.scala| 2 +- .../datasources/FileFormatWriter.scala | 6 +- .../PartitioningAwareFileIndex.scala| 2 - .../datasources/PartitioningUtils.scala | 94 +--- .../spark/sql/execution/command/DDLSuite.scala | 8 +- .../ParquetPartitionDiscoverySuite.scala| 21 +-- .../spark/sql/hive/HiveExternalCatalog.scala| 51 ++- .../spark/sql/hive/HiveSparkSubmitSuite.scala | 4 +- .../spark/sql/hive/MultiDatabaseSuite.scala | 2 +- .../spark/sql/hive/execution/HiveDDLSuite.scala | 2 +- .../sql/hive/execution/SQLQuerySuite.scala | 2 +- 19 files changed, 397 insertions(+), 208 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c602894f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala new file mode 100644 index 000..b1442ee --- /dev/null +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.catalog + +import org.apache.hadoop.fs.Path +import org.apache.hadoop.util.Shell + +import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec + +object ExternalCatalogUtils { + // This duplicates default value of Hive `ConfVars.DEFAULTPARTITIONNAME`, since catalyst doesn't + // depend on Hive. + val DEFAULT_PARTITION_NAME = "__HIVE_DEFAULT_PARTITION__" + +
spark git commit: [SPARK-17990][SPARK-18302][SQL] correct several partition related behaviours of ExternalCatalog
Repository: spark Updated Branches: refs/heads/master b533fa2b2 -> 2f7461f31 [SPARK-17990][SPARK-18302][SQL] correct several partition related behaviours of ExternalCatalog ## What changes were proposed in this pull request? This PR corrects several partition related behaviors of `ExternalCatalog`: 1. default partition location should not always lower case the partition column names in path string(fix `HiveExternalCatalog`) 2. rename partition should not always lower case the partition column names in updated partition path string(fix `HiveExternalCatalog`) 3. rename partition should update the partition location only for managed table(fix `InMemoryCatalog`) 4. create partition with existing directory should be fine(fix `InMemoryCatalog`) 5. create partition with non-existing directory should create that directory(fix `InMemoryCatalog`) 6. drop partition from external table should not delete the directory(fix `InMemoryCatalog`) ## How was this patch tested? new tests in `ExternalCatalogSuite` Author: Wenchen FanCloses #15797 from cloud-fan/partition. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2f7461f3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2f7461f3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2f7461f3 Branch: refs/heads/master Commit: 2f7461f31331cfc37f6cfa3586b7bbefb3af5547 Parents: b533fa2 Author: Wenchen Fan Authored: Thu Nov 10 13:42:48 2016 -0800 Committer: Reynold Xin Committed: Thu Nov 10 13:42:48 2016 -0800 -- .../catalyst/catalog/ExternalCatalogUtils.scala | 121 +++ .../sql/catalyst/catalog/InMemoryCatalog.scala | 92 ++-- .../spark/sql/catalyst/catalog/interface.scala | 11 ++ .../catalyst/catalog/ExternalCatalogSuite.scala | 150 +++ .../catalyst/catalog/SessionCatalogSuite.scala | 24 ++- .../spark/sql/execution/command/ddl.scala | 8 +- .../spark/sql/execution/command/tables.scala| 3 +- .../datasources/CatalogFileIndex.scala | 2 +- .../datasources/DataSourceStrategy.scala| 2 +- .../datasources/FileFormatWriter.scala | 6 +- .../PartitioningAwareFileIndex.scala| 2 - .../datasources/PartitioningUtils.scala | 94 +--- .../spark/sql/execution/command/DDLSuite.scala | 8 +- .../ParquetPartitionDiscoverySuite.scala| 21 +-- .../spark/sql/hive/HiveExternalCatalog.scala| 51 ++- .../spark/sql/hive/HiveSparkSubmitSuite.scala | 4 +- .../spark/sql/hive/MultiDatabaseSuite.scala | 2 +- .../spark/sql/hive/execution/HiveDDLSuite.scala | 2 +- .../sql/hive/execution/SQLQuerySuite.scala | 2 +- 19 files changed, 397 insertions(+), 208 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2f7461f3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala new file mode 100644 index 000..b1442ee --- /dev/null +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.catalog + +import org.apache.hadoop.fs.Path +import org.apache.hadoop.util.Shell + +import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec + +object ExternalCatalogUtils { + // This duplicates default value of Hive `ConfVars.DEFAULTPARTITIONNAME`, since catalyst doesn't + // depend on Hive. + val DEFAULT_PARTITION_NAME = "__HIVE_DEFAULT_PARTITION__" + + // + // The following string escaping code is mainly copied
spark git commit: [SPARK-17993][SQL] Fix Parquet log output redirection
Repository: spark Updated Branches: refs/heads/branch-2.1 62236b9eb -> be3933ddf [SPARK-17993][SQL] Fix Parquet log output redirection (Link to Jira issue: https://issues.apache.org/jira/browse/SPARK-17993) ## What changes were proposed in this pull request? PR #14690 broke parquet log output redirection for converted partitioned Hive tables. For example, when querying parquet files written by Parquet-mr 1.6.0 Spark prints a torrent of (harmless) warning messages from the Parquet reader: ``` Oct 18, 2016 7:42:18 PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by could not be parsed (see PARQUET-251): parquet-mr version 1.6.0 org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-mr version 1.6.0 using format: (.+) version ((.*) )?\(build ?(.*)\) at org.apache.parquet.VersionParser.parse(VersionParser.java:112) at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60) at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:263) at org.apache.parquet.hadoop.ParquetFileReader$Chunk.readAllPages(ParquetFileReader.java:583) at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:513) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:270) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:225) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:102) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:162) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:102) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:372) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` This only happens during execution, not planning, and it doesn't matter what log level the `SparkContext` is set to. That's because Parquet (versions < 1.9) doesn't use slf4j for logging. Note, you can tell that log redirection is not working here because the log message format does not conform to the default Spark log message format. This is a regression I noted as something we needed to fix as a follow up. It appears that the problem arose because we removed the call to `inferSchema` during Hive table conversion. That call is what triggered the output redirection. ## How was this patch tested? I tested this manually in four ways: 1. Executing `spark.sqlContext.range(10).selectExpr("id as a").write.mode("overwrite").parquet("test")`. 2. Executing `spark.read.format("parquet").load(legacyParquetFile).show` for a Parquet file `legacyParquetFile` written using Parquet-mr 1.6.0. 3. Executing `select * from legacy_parquet_table limit 1` for some unpartitioned Parquet-based Hive table written using Parquet-mr 1.6.0. 4. Executing `select * from legacy_partitioned_parquet_table where partcol=x limit 1` for some partitioned Parquet-based Hive table written using Parquet-mr 1.6.0. I ran each test with a new instance of `spark-shell` or `spark-sql`. Incidentally, I found that test case 3 was not a
spark git commit: [SPARK-17993][SQL] Fix Parquet log output redirection
Repository: spark Updated Branches: refs/heads/master 16eaad9da -> b533fa2b2 [SPARK-17993][SQL] Fix Parquet log output redirection (Link to Jira issue: https://issues.apache.org/jira/browse/SPARK-17993) ## What changes were proposed in this pull request? PR #14690 broke parquet log output redirection for converted partitioned Hive tables. For example, when querying parquet files written by Parquet-mr 1.6.0 Spark prints a torrent of (harmless) warning messages from the Parquet reader: ``` Oct 18, 2016 7:42:18 PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by could not be parsed (see PARQUET-251): parquet-mr version 1.6.0 org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-mr version 1.6.0 using format: (.+) version ((.*) )?\(build ?(.*)\) at org.apache.parquet.VersionParser.parse(VersionParser.java:112) at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60) at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:263) at org.apache.parquet.hadoop.ParquetFileReader$Chunk.readAllPages(ParquetFileReader.java:583) at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:513) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:270) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:225) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:102) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:162) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:102) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:372) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` This only happens during execution, not planning, and it doesn't matter what log level the `SparkContext` is set to. That's because Parquet (versions < 1.9) doesn't use slf4j for logging. Note, you can tell that log redirection is not working here because the log message format does not conform to the default Spark log message format. This is a regression I noted as something we needed to fix as a follow up. It appears that the problem arose because we removed the call to `inferSchema` during Hive table conversion. That call is what triggered the output redirection. ## How was this patch tested? I tested this manually in four ways: 1. Executing `spark.sqlContext.range(10).selectExpr("id as a").write.mode("overwrite").parquet("test")`. 2. Executing `spark.read.format("parquet").load(legacyParquetFile).show` for a Parquet file `legacyParquetFile` written using Parquet-mr 1.6.0. 3. Executing `select * from legacy_parquet_table limit 1` for some unpartitioned Parquet-based Hive table written using Parquet-mr 1.6.0. 4. Executing `select * from legacy_partitioned_parquet_table where partcol=x limit 1` for some partitioned Parquet-based Hive table written using Parquet-mr 1.6.0. I ran each test with a new instance of `spark-shell` or `spark-sql`. Incidentally, I found that test case 3 was not a
spark git commit: [SPARK-18262][BUILD][SQL] JSON.org license is now CatX
Repository: spark Updated Branches: refs/heads/branch-2.1 b54d71b6f -> 62236b9eb [SPARK-18262][BUILD][SQL] JSON.org license is now CatX ## What changes were proposed in this pull request? Try excluding org.json:json from hive-exec dep as it's Cat X now. It may be the case that it's not used by the part of Hive Spark uses anyway. ## How was this patch tested? Existing tests Author: Sean OwenCloses #15798 from srowen/SPARK-18262. (cherry picked from commit 16eaad9daed0b633e6a714b5704509aa7107d6e5) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/62236b9e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/62236b9e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/62236b9e Branch: refs/heads/branch-2.1 Commit: 62236b9eb951f171d96e9d7f5f12d641a2da9a26 Parents: b54d71b Author: Sean Owen Authored: Thu Nov 10 10:20:03 2016 -0800 Committer: Reynold Xin Committed: Thu Nov 10 10:20:11 2016 -0800 -- NOTICE | 3 --- dev/deps/spark-deps-hadoop-2.2 | 1 - dev/deps/spark-deps-hadoop-2.3 | 1 - dev/deps/spark-deps-hadoop-2.4 | 1 - dev/deps/spark-deps-hadoop-2.6 | 1 - dev/deps/spark-deps-hadoop-2.7 | 1 - pom.xml| 5 + 7 files changed, 5 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/62236b9e/NOTICE -- diff --git a/NOTICE b/NOTICE index 69b513e..f4b64b5 100644 --- a/NOTICE +++ b/NOTICE @@ -421,9 +421,6 @@ Copyright (c) 2011, Terrence Parr. This product includes/uses ASM (http://asm.ow2.org/), Copyright (c) 2000-2007 INRIA, France Telecom. -This product includes/uses org.json (http://www.json.org/java/index.html), -Copyright (c) 2002 JSON.org - This product includes/uses JLine (http://jline.sourceforge.net/), Copyright (c) 2002-2006, Marc Prud'hommeaux . http://git-wip-us.apache.org/repos/asf/spark/blob/62236b9e/dev/deps/spark-deps-hadoop-2.2 -- diff --git a/dev/deps/spark-deps-hadoop-2.2 b/dev/deps/spark-deps-hadoop-2.2 index 99279a4..6e749ac 100644 --- a/dev/deps/spark-deps-hadoop-2.2 +++ b/dev/deps/spark-deps-hadoop-2.2 @@ -103,7 +103,6 @@ jline-2.12.1.jar joda-time-2.9.3.jar jodd-core-3.5.2.jar jpam-1.1.jar -json-20090211.jar json4s-ast_2.11-3.2.11.jar json4s-core_2.11-3.2.11.jar json4s-jackson_2.11-3.2.11.jar http://git-wip-us.apache.org/repos/asf/spark/blob/62236b9e/dev/deps/spark-deps-hadoop-2.3 -- diff --git a/dev/deps/spark-deps-hadoop-2.3 b/dev/deps/spark-deps-hadoop-2.3 index f094b4a..515995a 100644 --- a/dev/deps/spark-deps-hadoop-2.3 +++ b/dev/deps/spark-deps-hadoop-2.3 @@ -108,7 +108,6 @@ jline-2.12.1.jar joda-time-2.9.3.jar jodd-core-3.5.2.jar jpam-1.1.jar -json-20090211.jar json4s-ast_2.11-3.2.11.jar json4s-core_2.11-3.2.11.jar json4s-jackson_2.11-3.2.11.jar http://git-wip-us.apache.org/repos/asf/spark/blob/62236b9e/dev/deps/spark-deps-hadoop-2.4 -- diff --git a/dev/deps/spark-deps-hadoop-2.4 b/dev/deps/spark-deps-hadoop-2.4 index 7f0ef98..d2139fd 100644 --- a/dev/deps/spark-deps-hadoop-2.4 +++ b/dev/deps/spark-deps-hadoop-2.4 @@ -108,7 +108,6 @@ jline-2.12.1.jar joda-time-2.9.3.jar jodd-core-3.5.2.jar jpam-1.1.jar -json-20090211.jar json4s-ast_2.11-3.2.11.jar json4s-core_2.11-3.2.11.jar json4s-jackson_2.11-3.2.11.jar http://git-wip-us.apache.org/repos/asf/spark/blob/62236b9e/dev/deps/spark-deps-hadoop-2.6 -- diff --git a/dev/deps/spark-deps-hadoop-2.6 b/dev/deps/spark-deps-hadoop-2.6 index 4a27bf3..b5cecf7 100644 --- a/dev/deps/spark-deps-hadoop-2.6 +++ b/dev/deps/spark-deps-hadoop-2.6 @@ -116,7 +116,6 @@ jline-2.12.1.jar joda-time-2.9.3.jar jodd-core-3.5.2.jar jpam-1.1.jar -json-20090211.jar json4s-ast_2.11-3.2.11.jar json4s-core_2.11-3.2.11.jar json4s-jackson_2.11-3.2.11.jar http://git-wip-us.apache.org/repos/asf/spark/blob/62236b9e/dev/deps/spark-deps-hadoop-2.7 -- diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7 index 151670a..a5e03a7 100644 --- a/dev/deps/spark-deps-hadoop-2.7 +++ b/dev/deps/spark-deps-hadoop-2.7 @@ -116,7 +116,6 @@ jline-2.12.1.jar joda-time-2.9.3.jar jodd-core-3.5.2.jar jpam-1.1.jar -json-20090211.jar json4s-ast_2.11-3.2.11.jar json4s-core_2.11-3.2.11.jar json4s-jackson_2.11-3.2.11.jar http://git-wip-us.apache.org/repos/asf/spark/blob/62236b9e/pom.xml
spark git commit: [SPARK-18262][BUILD][SQL] JSON.org license is now CatX
Repository: spark Updated Branches: refs/heads/master 22a9d064e -> 16eaad9da [SPARK-18262][BUILD][SQL] JSON.org license is now CatX ## What changes were proposed in this pull request? Try excluding org.json:json from hive-exec dep as it's Cat X now. It may be the case that it's not used by the part of Hive Spark uses anyway. ## How was this patch tested? Existing tests Author: Sean OwenCloses #15798 from srowen/SPARK-18262. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/16eaad9d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/16eaad9d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/16eaad9d Branch: refs/heads/master Commit: 16eaad9daed0b633e6a714b5704509aa7107d6e5 Parents: 22a9d06 Author: Sean Owen Authored: Thu Nov 10 10:20:03 2016 -0800 Committer: Reynold Xin Committed: Thu Nov 10 10:20:03 2016 -0800 -- NOTICE | 3 --- dev/deps/spark-deps-hadoop-2.2 | 1 - dev/deps/spark-deps-hadoop-2.3 | 1 - dev/deps/spark-deps-hadoop-2.4 | 1 - dev/deps/spark-deps-hadoop-2.6 | 1 - dev/deps/spark-deps-hadoop-2.7 | 1 - pom.xml| 5 + 7 files changed, 5 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/16eaad9d/NOTICE -- diff --git a/NOTICE b/NOTICE index 69b513e..f4b64b5 100644 --- a/NOTICE +++ b/NOTICE @@ -421,9 +421,6 @@ Copyright (c) 2011, Terrence Parr. This product includes/uses ASM (http://asm.ow2.org/), Copyright (c) 2000-2007 INRIA, France Telecom. -This product includes/uses org.json (http://www.json.org/java/index.html), -Copyright (c) 2002 JSON.org - This product includes/uses JLine (http://jline.sourceforge.net/), Copyright (c) 2002-2006, Marc Prud'hommeaux . http://git-wip-us.apache.org/repos/asf/spark/blob/16eaad9d/dev/deps/spark-deps-hadoop-2.2 -- diff --git a/dev/deps/spark-deps-hadoop-2.2 b/dev/deps/spark-deps-hadoop-2.2 index 99279a4..6e749ac 100644 --- a/dev/deps/spark-deps-hadoop-2.2 +++ b/dev/deps/spark-deps-hadoop-2.2 @@ -103,7 +103,6 @@ jline-2.12.1.jar joda-time-2.9.3.jar jodd-core-3.5.2.jar jpam-1.1.jar -json-20090211.jar json4s-ast_2.11-3.2.11.jar json4s-core_2.11-3.2.11.jar json4s-jackson_2.11-3.2.11.jar http://git-wip-us.apache.org/repos/asf/spark/blob/16eaad9d/dev/deps/spark-deps-hadoop-2.3 -- diff --git a/dev/deps/spark-deps-hadoop-2.3 b/dev/deps/spark-deps-hadoop-2.3 index f094b4a..515995a 100644 --- a/dev/deps/spark-deps-hadoop-2.3 +++ b/dev/deps/spark-deps-hadoop-2.3 @@ -108,7 +108,6 @@ jline-2.12.1.jar joda-time-2.9.3.jar jodd-core-3.5.2.jar jpam-1.1.jar -json-20090211.jar json4s-ast_2.11-3.2.11.jar json4s-core_2.11-3.2.11.jar json4s-jackson_2.11-3.2.11.jar http://git-wip-us.apache.org/repos/asf/spark/blob/16eaad9d/dev/deps/spark-deps-hadoop-2.4 -- diff --git a/dev/deps/spark-deps-hadoop-2.4 b/dev/deps/spark-deps-hadoop-2.4 index 7f0ef98..d2139fd 100644 --- a/dev/deps/spark-deps-hadoop-2.4 +++ b/dev/deps/spark-deps-hadoop-2.4 @@ -108,7 +108,6 @@ jline-2.12.1.jar joda-time-2.9.3.jar jodd-core-3.5.2.jar jpam-1.1.jar -json-20090211.jar json4s-ast_2.11-3.2.11.jar json4s-core_2.11-3.2.11.jar json4s-jackson_2.11-3.2.11.jar http://git-wip-us.apache.org/repos/asf/spark/blob/16eaad9d/dev/deps/spark-deps-hadoop-2.6 -- diff --git a/dev/deps/spark-deps-hadoop-2.6 b/dev/deps/spark-deps-hadoop-2.6 index 4a27bf3..b5cecf7 100644 --- a/dev/deps/spark-deps-hadoop-2.6 +++ b/dev/deps/spark-deps-hadoop-2.6 @@ -116,7 +116,6 @@ jline-2.12.1.jar joda-time-2.9.3.jar jodd-core-3.5.2.jar jpam-1.1.jar -json-20090211.jar json4s-ast_2.11-3.2.11.jar json4s-core_2.11-3.2.11.jar json4s-jackson_2.11-3.2.11.jar http://git-wip-us.apache.org/repos/asf/spark/blob/16eaad9d/dev/deps/spark-deps-hadoop-2.7 -- diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7 index 151670a..a5e03a7 100644 --- a/dev/deps/spark-deps-hadoop-2.7 +++ b/dev/deps/spark-deps-hadoop-2.7 @@ -116,7 +116,6 @@ jline-2.12.1.jar joda-time-2.9.3.jar jodd-core-3.5.2.jar jpam-1.1.jar -json-20090211.jar json4s-ast_2.11-3.2.11.jar json4s-core_2.11-3.2.11.jar json4s-jackson_2.11-3.2.11.jar http://git-wip-us.apache.org/repos/asf/spark/blob/16eaad9d/pom.xml -- diff --git a/pom.xml b/pom.xml index 04d2eaa..8aa0a6c 100644 ---
spark git commit: [SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `OutputFormat` is `Configurable`.
Repository: spark Updated Branches: refs/heads/master d8b81f778 -> 64fbdf1aa [SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `OutputFormat` is `Configurable`. ## What changes were proposed in this pull request? We should call `setConf` if `OutputFormat` is `Configurable`, this should be done before we create `OutputCommitter` and `RecordWriter`. This is follow up of #15769, see discussion [here](https://github.com/apache/spark/pull/15769/files#r87064229) ## How was this patch tested? Add test of this case in `PairRDDFunctionsSuite`. Author: jiangxingboCloses #15823 from jiangxb1987/config-format. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/64fbdf1a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/64fbdf1a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/64fbdf1a Branch: refs/heads/master Commit: 64fbdf1aa90b66269daec29f62dc9431c1173bab Parents: d8b81f7 Author: jiangxingbo Authored: Wed Nov 9 13:14:26 2016 -0800 Committer: Reynold Xin Committed: Wed Nov 9 13:14:26 2016 -0800 -- .../internal/io/HadoopMapReduceCommitProtocol.scala | 9 - .../internal/io/SparkHadoopMapReduceWriter.scala | 9 +++-- .../org/apache/spark/rdd/PairRDDFunctionsSuite.scala | 15 +++ 3 files changed, 30 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/64fbdf1a/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala -- diff --git a/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala b/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala index d643a32..6b0bcb8 100644 --- a/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala +++ b/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala @@ -19,6 +19,7 @@ package org.apache.spark.internal.io import java.util.Date +import org.apache.hadoop.conf.Configurable import org.apache.hadoop.fs.Path import org.apache.hadoop.mapreduce._ import org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter @@ -42,7 +43,13 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: String) @transient private var committer: OutputCommitter = _ protected def setupCommitter(context: TaskAttemptContext): OutputCommitter = { -context.getOutputFormatClass.newInstance().getOutputCommitter(context) +val format = context.getOutputFormatClass.newInstance() +// If OutputFormat is Configurable, we should set conf to it. +format match { + case c: Configurable => c.setConf(context.getConfiguration) + case _ => () +} +format.getOutputCommitter(context) } override def newTaskTempFile( http://git-wip-us.apache.org/repos/asf/spark/blob/64fbdf1a/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala -- diff --git a/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala b/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala index a405c44..7964392 100644 --- a/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala +++ b/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala @@ -23,7 +23,7 @@ import java.util.{Date, Locale} import scala.reflect.ClassTag import scala.util.DynamicVariable -import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.conf.{Configurable, Configuration} import org.apache.hadoop.fs.Path import org.apache.hadoop.mapred.{JobConf, JobID} import org.apache.hadoop.mapreduce._ @@ -140,7 +140,12 @@ object SparkHadoopMapReduceWriter extends Logging { SparkHadoopWriterUtils.initHadoopOutputMetrics(context) // Initiate the writer. -val taskFormat = outputFormat.newInstance +val taskFormat = outputFormat.newInstance() +// If OutputFormat is Configurable, we should set conf to it. +taskFormat match { + case c: Configurable => c.setConf(hadoopConf) + case _ => () +} val writer = taskFormat.getRecordWriter(taskContext) .asInstanceOf[RecordWriter[K, V]] require(writer != null, "Unable to obtain RecordWriter") http://git-wip-us.apache.org/repos/asf/spark/blob/64fbdf1a/core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala b/core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala index
spark git commit: [SPARK-18370][SQL] Add table information to InsertIntoHadoopFsRelationCommand
Repository: spark Updated Branches: refs/heads/branch-2.1 80f58510a -> 4424c901e [SPARK-18370][SQL] Add table information to InsertIntoHadoopFsRelationCommand ## What changes were proposed in this pull request? `InsertIntoHadoopFsRelationCommand` does not keep track if it inserts into a table and what table it inserts to. This can make debugging these statements problematic. This PR adds table information the `InsertIntoHadoopFsRelationCommand`. Explaining this SQL command `insert into prq select * from range(0, 10)` now yields the following executed plan: ``` == Physical Plan == ExecutedCommand +- InsertIntoHadoopFsRelationCommand file:/dev/assembly/spark-warehouse/prq, ParquetFormat, , Map(serialization.format -> 1, path -> file:/dev/assembly/spark-warehouse/prq), Append, CatalogTable( Table: `default`.`prq` Owner: hvanhovell Created: Wed Nov 09 17:42:30 CET 2016 Last Access: Thu Jan 01 01:00:00 CET 1970 Type: MANAGED Schema: [StructField(id,LongType,true)] Provider: parquet Properties: [transient_lastDdlTime=1478709750] Storage(Location: file:/dev/assembly/spark-warehouse/prq, InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, Serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Properties: [serialization.format=1])) +- Project [id#7L] +- Range (0, 10, step=1, splits=None) ``` ## How was this patch tested? Added extra checks to the `ParquetMetastoreSuite` Author: Herman van HovellCloses #15832 from hvanhovell/SPARK-18370. (cherry picked from commit d8b81f778af8c3d7112ad37f691c49215b392836) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4424c901 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4424c901 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4424c901 Branch: refs/heads/branch-2.1 Commit: 4424c901e82ed4992d5568cbc5a5f524b88dc5eb Parents: 80f5851 Author: Herman van Hovell Authored: Wed Nov 9 12:26:09 2016 -0800 Committer: Reynold Xin Committed: Wed Nov 9 12:26:17 2016 -0800 -- .../apache/spark/sql/execution/datasources/DataSource.scala| 3 ++- .../spark/sql/execution/datasources/DataSourceStrategy.scala | 5 +++-- .../datasources/InsertIntoHadoopFsRelationCommand.scala| 5 +++-- .../test/scala/org/apache/spark/sql/hive/parquetSuites.scala | 6 -- 4 files changed, 12 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4424c901/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala index 5266611..5d66394 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala @@ -424,7 +424,8 @@ case class DataSource( _ => Unit, // No existing table needs to be refreshed. options, data.logicalPlan, -mode) +mode, +catalogTable) sparkSession.sessionState.executePlan(plan).toRdd // Replace the schema with that of the DataFrame we just wrote out to avoid re-inferring it. copy(userSpecifiedSchema = Some(data.schema.asNullable)).resolveRelation() http://git-wip-us.apache.org/repos/asf/spark/blob/4424c901/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala index a548e88..2d43a6a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala @@ -162,7 +162,7 @@ case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] { case i @ logical.InsertIntoTable( - l @ LogicalRelation(t: HadoopFsRelation, _, _), part, query, overwrite, false) + l @ LogicalRelation(t: HadoopFsRelation, _, table), part, query, overwrite, false) if query.resolved && t.schema.asNullable ==
spark git commit: [SPARK-18370][SQL] Add table information to InsertIntoHadoopFsRelationCommand
Repository: spark Updated Branches: refs/heads/master d4028de97 -> d8b81f778 [SPARK-18370][SQL] Add table information to InsertIntoHadoopFsRelationCommand ## What changes were proposed in this pull request? `InsertIntoHadoopFsRelationCommand` does not keep track if it inserts into a table and what table it inserts to. This can make debugging these statements problematic. This PR adds table information the `InsertIntoHadoopFsRelationCommand`. Explaining this SQL command `insert into prq select * from range(0, 10)` now yields the following executed plan: ``` == Physical Plan == ExecutedCommand +- InsertIntoHadoopFsRelationCommand file:/dev/assembly/spark-warehouse/prq, ParquetFormat, , Map(serialization.format -> 1, path -> file:/dev/assembly/spark-warehouse/prq), Append, CatalogTable( Table: `default`.`prq` Owner: hvanhovell Created: Wed Nov 09 17:42:30 CET 2016 Last Access: Thu Jan 01 01:00:00 CET 1970 Type: MANAGED Schema: [StructField(id,LongType,true)] Provider: parquet Properties: [transient_lastDdlTime=1478709750] Storage(Location: file:/dev/assembly/spark-warehouse/prq, InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, Serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Properties: [serialization.format=1])) +- Project [id#7L] +- Range (0, 10, step=1, splits=None) ``` ## How was this patch tested? Added extra checks to the `ParquetMetastoreSuite` Author: Herman van HovellCloses #15832 from hvanhovell/SPARK-18370. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d8b81f77 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d8b81f77 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d8b81f77 Branch: refs/heads/master Commit: d8b81f778af8c3d7112ad37f691c49215b392836 Parents: d4028de Author: Herman van Hovell Authored: Wed Nov 9 12:26:09 2016 -0800 Committer: Reynold Xin Committed: Wed Nov 9 12:26:09 2016 -0800 -- .../apache/spark/sql/execution/datasources/DataSource.scala| 3 ++- .../spark/sql/execution/datasources/DataSourceStrategy.scala | 5 +++-- .../datasources/InsertIntoHadoopFsRelationCommand.scala| 5 +++-- .../test/scala/org/apache/spark/sql/hive/parquetSuites.scala | 6 -- 4 files changed, 12 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d8b81f77/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala index 5266611..5d66394 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala @@ -424,7 +424,8 @@ case class DataSource( _ => Unit, // No existing table needs to be refreshed. options, data.logicalPlan, -mode) +mode, +catalogTable) sparkSession.sessionState.executePlan(plan).toRdd // Replace the schema with that of the DataFrame we just wrote out to avoid re-inferring it. copy(userSpecifiedSchema = Some(data.schema.asNullable)).resolveRelation() http://git-wip-us.apache.org/repos/asf/spark/blob/d8b81f77/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala index a548e88..2d43a6a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala @@ -162,7 +162,7 @@ case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] { case i @ logical.InsertIntoTable( - l @ LogicalRelation(t: HadoopFsRelation, _, _), part, query, overwrite, false) + l @ LogicalRelation(t: HadoopFsRelation, _, table), part, query, overwrite, false) if query.resolved && t.schema.asNullable == query.schema.asNullable => // Sanity checks @@ -222,7 +222,8 @@ case class DataSourceAnalysis(conf: CatalystConf) extends
spark git commit: [SPARK-18368] Fix regexp_replace with task serialization.
Repository: spark Updated Branches: refs/heads/branch-2.0 0cceb1bfe -> bdddc661b [SPARK-18368] Fix regexp_replace with task serialization. ## What changes were proposed in this pull request? This makes the result value both transient and lazy, so that if the RegExpReplace object is initialized then serialized, `result: StringBuffer` will be correctly initialized. ## How was this patch tested? * Verified that this patch fixed the query that found the bug. * Added a test case that fails without the fix. Author: Ryan BlueCloses #15816 from rdblue/SPARK-18368-fix-regexp-replace. (cherry picked from commit b9192bb3ffc319ebee7dbd15c24656795e454749) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bdddc661 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bdddc661 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bdddc661 Branch: refs/heads/branch-2.0 Commit: bdddc661b71725dce35c6b2edd9ccb22e774e997 Parents: 0cceb1b Author: Ryan Blue Authored: Tue Nov 8 23:47:48 2016 -0800 Committer: Reynold Xin Committed: Tue Nov 8 23:48:06 2016 -0800 -- .../sql/catalyst/expressions/regexpExpressions.scala | 2 +- .../catalyst/expressions/ExpressionEvalHelper.scala | 15 +-- 2 files changed, 10 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/bdddc661/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala index d25da3f..f6a55cf 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala @@ -220,7 +220,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio @transient private var lastReplacement: String = _ @transient private var lastReplacementInUTF8: UTF8String = _ // result buffer write by Matcher - @transient private val result: StringBuffer = new StringBuffer + @transient private lazy val result: StringBuffer = new StringBuffer override def nullSafeEval(s: Any, p: Any, r: Any): Any = { if (!p.equals(lastRegex)) { http://git-wip-us.apache.org/repos/asf/spark/blob/bdddc661/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala index 668543a..186079f 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala @@ -21,7 +21,8 @@ import org.scalacheck.Gen import org.scalactic.TripleEqualsSupport.Spread import org.scalatest.prop.GeneratorDrivenPropertyChecks -import org.apache.spark.SparkFunSuite +import org.apache.spark.{SparkConf, SparkFunSuite} +import org.apache.spark.serializer.JavaSerializer import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.optimizer.SimpleTestOptimizer @@ -42,13 +43,15 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { protected def checkEvaluation( expression: => Expression, expected: Any, inputRow: InternalRow = EmptyRow): Unit = { +val serializer = new JavaSerializer(new SparkConf()).newInstance +val expr: Expression = serializer.deserialize(serializer.serialize(expression)) val catalystValue = CatalystTypeConverters.convertToCatalyst(expected) -checkEvaluationWithoutCodegen(expression, catalystValue, inputRow) -checkEvaluationWithGeneratedMutableProjection(expression, catalystValue, inputRow) -if (GenerateUnsafeProjection.canSupport(expression.dataType)) { - checkEvalutionWithUnsafeProjection(expression, catalystValue, inputRow) +checkEvaluationWithoutCodegen(expr, catalystValue, inputRow) +checkEvaluationWithGeneratedMutableProjection(expr, catalystValue, inputRow) +if (GenerateUnsafeProjection.canSupport(expr.dataType)) { + checkEvalutionWithUnsafeProjection(expr, catalystValue, inputRow) } -
spark git commit: [SPARK-18368] Fix regexp_replace with task serialization.
Repository: spark Updated Branches: refs/heads/branch-2.1 0dc14f129 -> f67208369 [SPARK-18368] Fix regexp_replace with task serialization. ## What changes were proposed in this pull request? This makes the result value both transient and lazy, so that if the RegExpReplace object is initialized then serialized, `result: StringBuffer` will be correctly initialized. ## How was this patch tested? * Verified that this patch fixed the query that found the bug. * Added a test case that fails without the fix. Author: Ryan BlueCloses #15816 from rdblue/SPARK-18368-fix-regexp-replace. (cherry picked from commit b9192bb3ffc319ebee7dbd15c24656795e454749) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f6720836 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f6720836 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f6720836 Branch: refs/heads/branch-2.1 Commit: f672083693c2c4dfea6dc43c024993d4561b1e79 Parents: 0dc14f1 Author: Ryan Blue Authored: Tue Nov 8 23:47:48 2016 -0800 Committer: Reynold Xin Committed: Tue Nov 8 23:47:56 2016 -0800 -- .../sql/catalyst/expressions/regexpExpressions.scala | 2 +- .../catalyst/expressions/ExpressionEvalHelper.scala | 15 +-- 2 files changed, 10 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f6720836/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala index 5648ad6..4896a62 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala @@ -230,7 +230,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio @transient private var lastReplacement: String = _ @transient private var lastReplacementInUTF8: UTF8String = _ // result buffer write by Matcher - @transient private val result: StringBuffer = new StringBuffer + @transient private lazy val result: StringBuffer = new StringBuffer override def nullSafeEval(s: Any, p: Any, r: Any): Any = { if (!p.equals(lastRegex)) { http://git-wip-us.apache.org/repos/asf/spark/blob/f6720836/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala index 9ceb709..f836504 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala @@ -22,7 +22,8 @@ import org.scalactic.TripleEqualsSupport.Spread import org.scalatest.exceptions.TestFailedException import org.scalatest.prop.GeneratorDrivenPropertyChecks -import org.apache.spark.SparkFunSuite +import org.apache.spark.{SparkConf, SparkFunSuite} +import org.apache.spark.serializer.JavaSerializer import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.optimizer.SimpleTestOptimizer @@ -43,13 +44,15 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { protected def checkEvaluation( expression: => Expression, expected: Any, inputRow: InternalRow = EmptyRow): Unit = { +val serializer = new JavaSerializer(new SparkConf()).newInstance +val expr: Expression = serializer.deserialize(serializer.serialize(expression)) val catalystValue = CatalystTypeConverters.convertToCatalyst(expected) -checkEvaluationWithoutCodegen(expression, catalystValue, inputRow) -checkEvaluationWithGeneratedMutableProjection(expression, catalystValue, inputRow) -if (GenerateUnsafeProjection.canSupport(expression.dataType)) { - checkEvalutionWithUnsafeProjection(expression, catalystValue, inputRow) +checkEvaluationWithoutCodegen(expr, catalystValue, inputRow) +checkEvaluationWithGeneratedMutableProjection(expr, catalystValue, inputRow) +if (GenerateUnsafeProjection.canSupport(expr.dataType)) { + checkEvalutionWithUnsafeProjection(expr, catalystValue, inputRow) } -
spark git commit: [SPARK-18368] Fix regexp_replace with task serialization.
Repository: spark Updated Branches: refs/heads/master 4afa39e22 -> b9192bb3f [SPARK-18368] Fix regexp_replace with task serialization. ## What changes were proposed in this pull request? This makes the result value both transient and lazy, so that if the RegExpReplace object is initialized then serialized, `result: StringBuffer` will be correctly initialized. ## How was this patch tested? * Verified that this patch fixed the query that found the bug. * Added a test case that fails without the fix. Author: Ryan BlueCloses #15816 from rdblue/SPARK-18368-fix-regexp-replace. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b9192bb3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b9192bb3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b9192bb3 Branch: refs/heads/master Commit: b9192bb3ffc319ebee7dbd15c24656795e454749 Parents: 4afa39e Author: Ryan Blue Authored: Tue Nov 8 23:47:48 2016 -0800 Committer: Reynold Xin Committed: Tue Nov 8 23:47:48 2016 -0800 -- .../sql/catalyst/expressions/regexpExpressions.scala | 2 +- .../catalyst/expressions/ExpressionEvalHelper.scala | 15 +-- 2 files changed, 10 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b9192bb3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala index 5648ad6..4896a62 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala @@ -230,7 +230,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio @transient private var lastReplacement: String = _ @transient private var lastReplacementInUTF8: UTF8String = _ // result buffer write by Matcher - @transient private val result: StringBuffer = new StringBuffer + @transient private lazy val result: StringBuffer = new StringBuffer override def nullSafeEval(s: Any, p: Any, r: Any): Any = { if (!p.equals(lastRegex)) { http://git-wip-us.apache.org/repos/asf/spark/blob/b9192bb3/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala index 9ceb709..f836504 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala @@ -22,7 +22,8 @@ import org.scalactic.TripleEqualsSupport.Spread import org.scalatest.exceptions.TestFailedException import org.scalatest.prop.GeneratorDrivenPropertyChecks -import org.apache.spark.SparkFunSuite +import org.apache.spark.{SparkConf, SparkFunSuite} +import org.apache.spark.serializer.JavaSerializer import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.optimizer.SimpleTestOptimizer @@ -43,13 +44,15 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { protected def checkEvaluation( expression: => Expression, expected: Any, inputRow: InternalRow = EmptyRow): Unit = { +val serializer = new JavaSerializer(new SparkConf()).newInstance +val expr: Expression = serializer.deserialize(serializer.serialize(expression)) val catalystValue = CatalystTypeConverters.convertToCatalyst(expected) -checkEvaluationWithoutCodegen(expression, catalystValue, inputRow) -checkEvaluationWithGeneratedMutableProjection(expression, catalystValue, inputRow) -if (GenerateUnsafeProjection.canSupport(expression.dataType)) { - checkEvalutionWithUnsafeProjection(expression, catalystValue, inputRow) +checkEvaluationWithoutCodegen(expr, catalystValue, inputRow) +checkEvaluationWithGeneratedMutableProjection(expr, catalystValue, inputRow) +if (GenerateUnsafeProjection.canSupport(expr.dataType)) { + checkEvalutionWithUnsafeProjection(expr, catalystValue, inputRow) } -checkEvaluationWithOptimization(expression, catalystValue, inputRow) +checkEvaluationWithOptimization(expr, catalystValue,
spark git commit: [SPARK-18191][CORE] Port RDD API to use commit protocol
Repository: spark Updated Branches: refs/heads/master 73feaa30e -> 9c419698f [SPARK-18191][CORE] Port RDD API to use commit protocol ## What changes were proposed in this pull request? This PR port RDD API to use commit protocol, the changes made here: 1. Add new internal helper class that saves an RDD using a Hadoop OutputFormat named `SparkNewHadoopWriter`, it's similar with `SparkHadoopWriter` but uses commit protocol. This class supports the newer `mapreduce` API, instead of the old `mapred` API which is supported by `SparkHadoopWriter`; 2. Rewrite `PairRDDFunctions.saveAsNewAPIHadoopDataset` function, so it uses commit protocol now. ## How was this patch tested? Exsiting test cases. Author: jiangxingboCloses #15769 from jiangxb1987/rdd-commit. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9c419698 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9c419698 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9c419698 Branch: refs/heads/master Commit: 9c419698fe110a805570031cac3387a51957d9d1 Parents: 73feaa3 Author: jiangxingbo Authored: Tue Nov 8 09:41:01 2016 -0800 Committer: Reynold Xin Committed: Tue Nov 8 09:41:01 2016 -0800 -- .../org/apache/spark/SparkHadoopWriter.scala| 25 +- .../io/HadoopMapReduceCommitProtocol.scala | 6 +- .../io/SparkHadoopMapReduceWriter.scala | 249 +++ .../org/apache/spark/rdd/PairRDDFunctions.scala | 139 +-- .../spark/rdd/PairRDDFunctionsSuite.scala | 20 +- .../datasources/FileFormatWriter.scala | 4 +- .../spark/sql/hive/hiveWriterContainers.scala | 3 +- .../spark/streaming/dstream/DStream.scala | 5 +- .../streaming/scheduler/JobScheduler.scala | 5 +- 9 files changed, 280 insertions(+), 176 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9c419698/core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala -- diff --git a/core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala b/core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala index 7f75a39..46e22b2 100644 --- a/core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala +++ b/core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala @@ -23,11 +23,11 @@ import java.text.SimpleDateFormat import java.util.{Date, Locale} import org.apache.hadoop.fs.FileSystem -import org.apache.hadoop.fs.Path import org.apache.hadoop.mapred._ import org.apache.hadoop.mapreduce.TaskType import org.apache.spark.internal.Logging +import org.apache.spark.internal.io.SparkHadoopWriterUtils import org.apache.spark.mapred.SparkHadoopMapRedUtil import org.apache.spark.rdd.HadoopRDD import org.apache.spark.util.SerializableJobConf @@ -153,29 +153,8 @@ class SparkHadoopWriter(jobConf: JobConf) extends Logging with Serializable { splitID = splitid attemptID = attemptid -jID = new SerializableWritable[JobID](SparkHadoopWriter.createJobID(now, jobid)) +jID = new SerializableWritable[JobID](SparkHadoopWriterUtils.createJobID(now, jobid)) taID = new SerializableWritable[TaskAttemptID]( new TaskAttemptID(new TaskID(jID.value, TaskType.MAP, splitID), attemptID)) } } - -private[spark] -object SparkHadoopWriter { - def createJobID(time: Date, id: Int): JobID = { -val formatter = new SimpleDateFormat("MMddHHmmss", Locale.US) -val jobtrackerID = formatter.format(time) -new JobID(jobtrackerID, id) - } - - def createPathFromString(path: String, conf: JobConf): Path = { -if (path == null) { - throw new IllegalArgumentException("Output path is null") -} -val outputPath = new Path(path) -val fs = outputPath.getFileSystem(conf) -if (fs == null) { - throw new IllegalArgumentException("Incorrectly formatted output path") -} -outputPath.makeQualified(fs.getUri, fs.getWorkingDirectory) - } -} http://git-wip-us.apache.org/repos/asf/spark/blob/9c419698/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala -- diff --git a/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala b/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala index 66ccb6d..d643a32 100644 --- a/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala +++ b/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala @@ -24,7 +24,6 @@ import org.apache.hadoop.mapreduce._ import org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter import
[1/3] spark-website git commit: Add 1.6.3 release.
Repository: spark-website Updated Branches: refs/heads/asf-site 24d32b75d -> b9aa4c3ee http://git-wip-us.apache.org/repos/asf/spark-website/blob/b9aa4c3e/site/releases/spark-release-1-2-1.html -- diff --git a/site/releases/spark-release-1-2-1.html b/site/releases/spark-release-1-2-1.html index 5581c54..c9efc6a 100644 --- a/site/releases/spark-release-1-2-1.html +++ b/site/releases/spark-release-1-2-1.html @@ -150,6 +150,9 @@ Latest News + Spark 1.6.3 released + (Nov 07, 2016) + Spark 2.0.1 released (Oct 03, 2016) @@ -159,9 +162,6 @@ Spark 1.6.2 released (Jun 25, 2016) - Call for Presentations for Spark Summit EU is Open - (Jun 16, 2016) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/b9aa4c3e/site/releases/spark-release-1-2-2.html -- diff --git a/site/releases/spark-release-1-2-2.html b/site/releases/spark-release-1-2-2.html index c8a859a..d76c619 100644 --- a/site/releases/spark-release-1-2-2.html +++ b/site/releases/spark-release-1-2-2.html @@ -150,6 +150,9 @@ Latest News + Spark 1.6.3 released + (Nov 07, 2016) + Spark 2.0.1 released (Oct 03, 2016) @@ -159,9 +162,6 @@ Spark 1.6.2 released (Jun 25, 2016) - Call for Presentations for Spark Summit EU is Open - (Jun 16, 2016) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/b9aa4c3e/site/releases/spark-release-1-3-0.html -- diff --git a/site/releases/spark-release-1-3-0.html b/site/releases/spark-release-1-3-0.html index 382ef4d..435ed19 100644 --- a/site/releases/spark-release-1-3-0.html +++ b/site/releases/spark-release-1-3-0.html @@ -150,6 +150,9 @@ Latest News + Spark 1.6.3 released + (Nov 07, 2016) + Spark 2.0.1 released (Oct 03, 2016) @@ -159,9 +162,6 @@ Spark 1.6.2 released (Jun 25, 2016) - Call for Presentations for Spark Summit EU is Open - (Jun 16, 2016) - Archive @@ -191,7 +191,7 @@ To download Spark 1.3 visit the downloads page. Spark Core -Spark 1.3 sees a handful of usability improvements in the core engine. The core API now supports https://issues.apache.org/jira/browse/SPARK-5430;>multi level aggregation trees to help speed up expensive reduce operations. https://issues.apache.org/jira/browse/SPARK-5063;>Improved error reporting has been added for certain gotcha operations. Sparks Jetty dependency is https://issues.apache.org/jira/browse/SPARK-3996;>now shaded to help avoid conflicts with user programs. Spark now supports https://issues.apache.org/jira/browse/SPARK-3883;>SSL encryption for some communication endpoints. Finaly, realtime https://issues.apache.org/jira/browse/SPARK-3428;>GC metrics and https://issues.apache.org/jira/browse/SPARK-4874;>record counts have been added to the UI. +Spark 1.3 sees a handful of usability improvements in the core engine. The core API now supports https://issues.apache.org/jira/browse/SPARK-5430;>multi level aggregation trees to help speed up expensive reduce operations. https://issues.apache.org/jira/browse/SPARK-5063;>Improved error reporting has been added for certain gotcha operations. Sparks Jetty dependency is https://issues.apache.org/jira/browse/SPARK-3996;>now shaded to help avoid conflicts with user programs. Spark now supports https://issues.apache.org/jira/browse/SPARK-3883;>SSL encryption for some communication endpoints. Finaly, realtime https://issues.apache.org/jira/browse/SPARK-3428;>GC metrics and https://issues.apache.org/jira/browse/SPARK-4874;>record counts have been added to the UI. DataFrame API Spark 1.3 adds a new DataFrames API that provides powerful and convenient operators when working with structured datasets. The DataFrame is an evolution of the base RDD API that includes named fields along with schema information. Itâs easy to construct a DataFrame from sources such as Hive tables, JSON data, a JDBC database, or any implementation of Sparkâs new data source API. Data frames will become a common interchange format between Spark components and when importing and exporting data to other systems. Data frames are supported in Python, Scala, and Java. @@ -203,7 +203,7 @@ In this release Spark MLlib introduces several new algorithms: latent Dirichlet allocation (LDA) for https://issues.apache.org/jira/browse/SPARK-1405;>topic modeling, https://issues.apache.org/jira/browse/SPARK-2309;>multinomial logistic