svn commit: r25786 - in /dev/spark/2.4.0-SNAPSHOT-2018_03_16_16_01-8a1efe3-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Fri Mar 16 23:15:16 2018 New Revision: 25786 Log: Apache Spark 2.4.0-SNAPSHOT-2018_03_16_16_01-8a1efe3 docs [This commit notification would consist of 1449 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-23683][SQL] FileCommitProtocol.instantiate() hardening
Repository: spark Updated Branches: refs/heads/master 8a72734f3 -> 8a1efe307 [SPARK-23683][SQL] FileCommitProtocol.instantiate() hardening ## What changes were proposed in this pull request? With SPARK-20236, `FileCommitProtocol.instantiate()` looks for a three argument constructor, passing in the `dynamicPartitionOverwrite` parameter. If there is no such constructor, it falls back to the classic two-arg one. When `InsertIntoHadoopFsRelationCommand` passes down that `dynamicPartitionOverwrite` flag `to FileCommitProtocol.instantiate(`), it assumes that the instantiated protocol supports the specific requirements of dynamic partition overwrite. It does not notice when this does not hold, and so the output generated may be incorrect. This patch changes `FileCommitProtocol.instantiate()` so when `dynamicPartitionOverwrite == true`, it requires the protocol implementation to have a 3-arg constructor. Classic two arg constructors are supported when it is false. Also it adds some debug level logging for anyone trying to understand what's going on. ## How was this patch tested? Unit tests verify that * classes with only 2-arg constructor cannot be used with dynamic overwrite * classes with only 2-arg constructor can be used without dynamic overwrite * classes with 3 arg constructors can be used with both. * the fallback to any two arg ctor takes place after the attempt to load the 3-arg ctor, * passing in invalid class types fail as expected (regression tests on expected behavior) Author: Steve LoughranCloses #20824 from steveloughran/stevel/SPARK-23683-protocol-instantiate. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8a1efe30 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8a1efe30 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8a1efe30 Branch: refs/heads/master Commit: 8a1efe3076f29259151f1fba2ff894487efb6c4e Parents: 8a72734 Author: Steve Loughran Authored: Fri Mar 16 15:40:21 2018 -0700 Committer: Wenchen Fan Committed: Fri Mar 16 15:40:21 2018 -0700 -- .../spark/internal/io/FileCommitProtocol.scala | 11 +- .../FileCommitProtocolInstantiationSuite.scala | 148 +++ 2 files changed, 158 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8a1efe30/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala -- diff --git a/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala b/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala index 6d0059b..e6e9c9e 100644 --- a/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala +++ b/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala @@ -20,6 +20,7 @@ package org.apache.spark.internal.io import org.apache.hadoop.fs._ import org.apache.hadoop.mapreduce._ +import org.apache.spark.internal.Logging import org.apache.spark.util.Utils @@ -132,7 +133,7 @@ abstract class FileCommitProtocol { } -object FileCommitProtocol { +object FileCommitProtocol extends Logging { class TaskCommitMessage(val obj: Any) extends Serializable object EmptyTaskCommitMessage extends TaskCommitMessage(null) @@ -145,15 +146,23 @@ object FileCommitProtocol { jobId: String, outputPath: String, dynamicPartitionOverwrite: Boolean = false): FileCommitProtocol = { + +logDebug(s"Creating committer $className; job $jobId; output=$outputPath;" + + s" dynamic=$dynamicPartitionOverwrite") val clazz = Utils.classForName(className).asInstanceOf[Class[FileCommitProtocol]] // First try the constructor with arguments (jobId: String, outputPath: String, // dynamicPartitionOverwrite: Boolean). // If that doesn't exist, try the one with (jobId: string, outputPath: String). try { val ctor = clazz.getDeclaredConstructor(classOf[String], classOf[String], classOf[Boolean]) + logDebug("Using (String, String, Boolean) constructor") ctor.newInstance(jobId, outputPath, dynamicPartitionOverwrite.asInstanceOf[java.lang.Boolean]) } catch { case _: NoSuchMethodException => +logDebug("Falling back to (String, String) constructor") +require(!dynamicPartitionOverwrite, + "Dynamic Partition Overwrite is enabled but" + +s" the committer ${className} does not have the appropriate constructor") val ctor = clazz.getDeclaredConstructor(classOf[String], classOf[String]) ctor.newInstance(jobId, outputPath) }
svn commit: r25780 - in /dev/spark/2.4.0-SNAPSHOT-2018_03_16_12_01-8a72734-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Fri Mar 16 19:15:25 2018 New Revision: 25780 Log: Apache Spark 2.4.0-SNAPSHOT-2018_03_16_12_01-8a72734 docs [This commit notification would consist of 1448 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15009][PYTHON][ML] Construct a CountVectorizerModel from a vocabulary list
Repository: spark Updated Branches: refs/heads/master bd201bf61 -> 8a72734f3 [SPARK-15009][PYTHON][ML] Construct a CountVectorizerModel from a vocabulary list ## What changes were proposed in this pull request? Added a class method to construct CountVectorizerModel from a list of vocabulary strings, equivalent to the Scala version. Introduced a common param base class `_CountVectorizerParams` to allow the Python model to also own the parameters. This now matches the Scala class hierarchy. ## How was this patch tested? Added to CountVectorizer doctests to do a transform on a model constructed from vocab, and unit test to verify params and vocab are constructed correctly. Author: Bryan CutlerCloses #16770 from BryanCutler/pyspark-CountVectorizerModel-vocab_ctor-SPARK-15009. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8a72734f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8a72734f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8a72734f Branch: refs/heads/master Commit: 8a72734f33f6a0abbd3207b0d661633c8b25d9ad Parents: bd201bf Author: Bryan Cutler Authored: Fri Mar 16 11:42:57 2018 -0700 Committer: Holden Karau Committed: Fri Mar 16 11:42:57 2018 -0700 -- python/pyspark/ml/feature.py | 168 +- python/pyspark/ml/tests.py | 32 +++- 2 files changed, 142 insertions(+), 58 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8a72734f/python/pyspark/ml/feature.py -- diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py index f2e357f..a1ceb7f 100755 --- a/python/pyspark/ml/feature.py +++ b/python/pyspark/ml/feature.py @@ -19,12 +19,12 @@ import sys if sys.version > '3': basestring = str -from pyspark import since, keyword_only +from pyspark import since, keyword_only, SparkContext from pyspark.rdd import ignore_unicode_prefix from pyspark.ml.linalg import _convert_to_vector from pyspark.ml.param.shared import * from pyspark.ml.util import JavaMLReadable, JavaMLWritable -from pyspark.ml.wrapper import JavaEstimator, JavaModel, JavaTransformer, _jvm +from pyspark.ml.wrapper import JavaEstimator, JavaModel, JavaParams, JavaTransformer, _jvm from pyspark.ml.common import inherit_doc __all__ = ['Binarizer', @@ -403,8 +403,69 @@ class Bucketizer(JavaTransformer, HasInputCol, HasOutputCol, HasHandleInvalid, return self.getOrDefault(self.splits) +class _CountVectorizerParams(JavaParams, HasInputCol, HasOutputCol): +""" +Params for :py:attr:`CountVectorizer` and :py:attr:`CountVectorizerModel`. +""" + +minTF = Param( +Params._dummy(), "minTF", "Filter to ignore rare words in" + +" a document. For each document, terms with frequency/count less than the given" + +" threshold are ignored. If this is an integer >= 1, then this specifies a count (of" + +" times the term must appear in the document); if this is a double in [0,1), then this " + +"specifies a fraction (out of the document's token count). Note that the parameter is " + +"only used in transform of CountVectorizerModel and does not affect fitting. Default 1.0", +typeConverter=TypeConverters.toFloat) +minDF = Param( +Params._dummy(), "minDF", "Specifies the minimum number of" + +" different documents a term must appear in to be included in the vocabulary." + +" If this is an integer >= 1, this specifies the number of documents the term must" + +" appear in; if this is a double in [0,1), then this specifies the fraction of documents." + +" Default 1.0", typeConverter=TypeConverters.toFloat) +vocabSize = Param( +Params._dummy(), "vocabSize", "max size of the vocabulary. Default 1 << 18.", +typeConverter=TypeConverters.toInt) +binary = Param( +Params._dummy(), "binary", "Binary toggle to control the output vector values." + +" If True, all nonzero counts (after minTF filter applied) are set to 1. This is useful" + +" for discrete probabilistic models that model binary events rather than integer counts." + +" Default False", typeConverter=TypeConverters.toBoolean) + +def __init__(self, *args): +super(_CountVectorizerParams, self).__init__(*args) +self._setDefault(minTF=1.0, minDF=1.0, vocabSize=1 << 18, binary=False) + +@since("1.6.0") +def getMinTF(self): +""" +Gets the value of minTF or its default value. +""" +return self.getOrDefault(self.minTF) + +@since("1.6.0") +def getMinDF(self): +""" +Gets the value of minDF or its
spark git commit: [SPARK-23623][SS] Avoid concurrent use of cached consumers in CachedKafkaConsumer
Repository: spark Updated Branches: refs/heads/master 9945b0227 -> bd201bf61 [SPARK-23623][SS] Avoid concurrent use of cached consumers in CachedKafkaConsumer ## What changes were proposed in this pull request? CacheKafkaConsumer in the project `kafka-0-10-sql` is designed to maintain a pool of KafkaConsumers that can be reused. However, it was built with the assumption there will be only one task using trying to read the same Kafka TopicPartition at the same time. Hence, the cache was keyed by the TopicPartition a consumer is supposed to read. And any cases where this assumption may not be true, we have SparkPlan flag to disable the use of a cache. So it was up to the planner to correctly identify when it was not safe to use the cache and set the flag accordingly. Fundamentally, this is the wrong way to approach the problem. It is HARD for a high-level planner to reason about the low-level execution model, whether there will be multiple tasks in the same query trying to read the same partition. Case in point, 2.3.0 introduced stream-stream joins, and you can build a streaming self-join query on Kafka. It's pretty non-trivial to figure out how this leads to two tasks reading the same partition twice, possibly concurrently. And due to the non-triviality, it is hard to figure this out in the planner and set the flag to avoid the cache / consumer pool. And this can inadvertently lead to ConcurrentModificationException ,or worse, silent reading of incorrect data. Here is a better way to design this. The planner shouldnt have to understand these low-level optimizations. Rather the consumer pool should be smart enough avoid concurrent use of a cached consumer. Currently, it tries to do so but incorrectly (the flag inuse is not checked when returning a cached consumer, see [this](https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala#L403)). If there is another request for the same partition as a currently in-use consumer, the pool should automatically return a fresh consumer that should be closed when the task is done. Then the planner does not have to have a flag to avoid reuses. This PR is a step towards that goal. It does the following. - There are effectively two kinds of consumer that may be generated - Cached consumer - this should be returned to the pool at task end - Non-cached consumer - this should be closed at task end - A trait called KafkaConsumer is introduced to hide this difference from the users of the consumer so that the client code does not have to reason about whether to stop and release. They simply called `val consumer = KafkaConsumer.acquire` and then `consumer.release()`. - If there is request for a consumer that is in-use, then a new consumer is generated. - If there is a concurrent attempt of the same task, then a new consumer is generated, and the existing cached consumer is marked for close upon release. - In addition, I renamed the classes because CachedKafkaConsumer is a misnomer given that what it returns may or may not be cached. This PR does not remove the planner flag to avoid reuse to make this patch safe enough for merging in branch-2.3. This can be done later in master-only. ## How was this patch tested? A new stress test that verifies it is safe to concurrently get consumers for the same partition from the consumer pool. Author: Tathagata DasCloses #20767 from tdas/SPARK-23623. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bd201bf6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bd201bf6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bd201bf6 Branch: refs/heads/master Commit: bd201bf61e8e1713deb91b962f670c76c9e3492b Parents: 9945b02 Author: Tathagata Das Authored: Fri Mar 16 11:11:07 2018 -0700 Committer: Shixiong Zhu Committed: Fri Mar 16 11:11:07 2018 -0700 -- .../sql/kafka010/CachedKafkaConsumer.scala | 438 .../sql/kafka010/KafkaContinuousReader.scala| 5 +- .../spark/sql/kafka010/KafkaDataConsumer.scala | 516 +++ .../sql/kafka010/KafkaMicroBatchReader.scala| 22 +- .../spark/sql/kafka010/KafkaSourceRDD.scala | 23 +- .../sql/kafka010/CachedKafkaConsumerSuite.scala | 34 -- .../sql/kafka010/KafkaDataConsumerSuite.scala | 124 + 7 files changed, 651 insertions(+), 511 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/bd201bf6/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala -- diff --git
spark git commit: [SPARK-23680] Fix entrypoint.sh to properly support Arbitrary UIDs
Repository: spark Updated Branches: refs/heads/master 88d8de926 -> 9945b0227 [SPARK-23680] Fix entrypoint.sh to properly support Arbitrary UIDs ## What changes were proposed in this pull request? As described in SPARK-23680, entrypoint.sh returns an error code because of a command pipeline execution where it is expected in case of Openshift environments, where arbitrary UIDs are used to run containers ## How was this patch tested? This patch was manually tested by using docker-image-toll.sh script to generate a Spark driver image and running an example against an OpenShift cluster. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Ricardo Martinelli de OliveiraCloses #20822 from rimolive/rmartine-spark-23680. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9945b022 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9945b022 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9945b022 Branch: refs/heads/master Commit: 9945b0227efcd952c8e835453b2831a8c6d5d607 Parents: 88d8de9 Author: Ricardo Martinelli de Oliveira Authored: Fri Mar 16 10:37:11 2018 -0700 Committer: Erik Erlandson Committed: Fri Mar 16 10:37:11 2018 -0700 -- .../kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh| 3 +++ 1 file changed, 3 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9945b022/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh -- diff --git a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh index 3d67b0a..d0cf284 100755 --- a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh +++ b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh @@ -22,7 +22,10 @@ set -ex # Check whether there is a passwd entry for the container UID myuid=$(id -u) mygid=$(id -g) +# turn off -e for getent because it will return error code in anonymous uid case +set +e uidentry=$(getent passwd $myuid) +set -e # If there is no passwd entry for the container UID, attempt to create one if [ -z "$uidentry" ] ; then - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-23581][SQL] Add interpreted unsafe projection
Repository: spark Updated Branches: refs/heads/master dffeac369 -> 88d8de926 [SPARK-23581][SQL] Add interpreted unsafe projection ## What changes were proposed in this pull request? We currently can only create unsafe rows using code generation. This is a problem for situations in which code generation fails. There is no fallback, and as a result we cannot execute the query. This PR adds an interpreted version of `UnsafeProjection`. The implementation is modeled after `InterpretedMutableProjection`. It stores the expression results in a `GenericInternalRow`, and it then uses a conversion function to convert the `GenericInternalRow` into an `UnsafeRow`. This PR does not implement the actual code generated to interpreted fallback logic. This will be done in a follow-up. ## How was this patch tested? I am piggybacking on exiting `UnsafeProjection` tests, and I have added an interpreted version for each of these. Author: Herman van HovellCloses #20750 from hvanhovell/SPARK-23581. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/88d8de92 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/88d8de92 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/88d8de92 Branch: refs/heads/master Commit: 88d8de9260edf6e9d5449ff7ef6e35d16051fc9f Parents: dffeac3 Author: Herman van Hovell Authored: Fri Mar 16 18:28:16 2018 +0100 Committer: Herman van Hovell Committed: Fri Mar 16 18:28:16 2018 +0100 -- .../expressions/codegen/UnsafeArrayWriter.java | 32 +- .../expressions/codegen/UnsafeRowWriter.java| 30 +- .../expressions/codegen/UnsafeWriter.java | 43 +++ .../sql/catalyst/expressions/Expression.scala | 26 ++ .../InterpretedUnsafeProjection.scala | 366 +++ .../expressions/MonotonicallyIncreasingID.scala | 4 +- .../sql/catalyst/expressions/Projection.scala | 19 +- .../codegen/GenerateUnsafeProjection.scala | 2 +- .../expressions/randomExpressions.scala | 6 +- .../catalyst/expressions/ComplexTypeSuite.scala | 2 +- .../expressions/ExpressionEvalHelper.scala | 20 +- .../expressions/ObjectExpressionsSuite.scala| 21 +- .../catalyst/expressions/ScalaUDFSuite.scala| 2 +- .../expressions/UnsafeRowConverterSuite.scala | 56 +-- 14 files changed, 555 insertions(+), 74 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/88d8de92/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java -- diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java index 791e8d8..82cd1b2 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java @@ -30,7 +30,7 @@ import static org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.calculat * A helper class to write data into global row buffer using `UnsafeArrayData` format, * used by {@link org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection}. */ -public class UnsafeArrayWriter { +public final class UnsafeArrayWriter extends UnsafeWriter { private BufferHolder holder; @@ -83,7 +83,7 @@ public class UnsafeArrayWriter { return startingOffset + headerInBytes + ordinal * elementSize; } - public void setOffsetAndSize(int ordinal, long currentCursor, int size) { + public void setOffsetAndSize(int ordinal, int currentCursor, int size) { assertIndexIsValid(ordinal); final long relativeOffset = currentCursor - startingOffset; final long offsetAndSize = (relativeOffset << 32) | (long)size; @@ -96,49 +96,31 @@ public class UnsafeArrayWriter { BitSetMethods.set(holder.buffer, startingOffset + 8, ordinal); } - public void setNullBoolean(int ordinal) { -setNullBit(ordinal); -// put zero into the corresponding field when set null -Platform.putBoolean(holder.buffer, getElementOffset(ordinal, 1), false); - } - - public void setNullByte(int ordinal) { + public void setNull1Bytes(int ordinal) { setNullBit(ordinal); // put zero into the corresponding field when set null Platform.putByte(holder.buffer, getElementOffset(ordinal, 1), (byte)0); } - public void setNullShort(int ordinal) { + public void setNull2Bytes(int ordinal) { setNullBit(ordinal); // put zero into the corresponding field when set null Platform.putShort(holder.buffer,
[2/2] spark-website git commit: Squashed commit of the following:
Squashed commit of the following: commit 8e2dd71cf5613be6f019bb76b46226771422a40e Merge: 8bd24fb6d 01f0b4e0c Author: Reynold XinDate: Fri Mar 16 10:24:54 2018 -0700 Merge pull request #104 from mateiz/history Add a project history page commit 01f0b4e0c1fe77781850cf994058980664201bce Author: Matei Zaharia Date: Wed Mar 14 23:29:01 2018 -0700 Add a project history page Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/a1d84bcb Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/a1d84bcb Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/a1d84bcb Branch: refs/heads/asf-site Commit: a1d84bcbf53099be51c39914528bea3f4e2735a0 Parents: 8bd24fb Author: Reynold Xin Authored: Fri Mar 16 10:26:14 2018 -0700 Committer: Reynold Xin Committed: Fri Mar 16 10:26:14 2018 -0700 -- _layouts/global.html| 1 + community.md| 24 +- history.md | 29 +++ index.md| 16 +- site/committers.html| 1 + site/community.html | 24 +- site/contributing.html | 1 + site/developer-tools.html | 1 + site/documentation.html | 1 + site/downloads.html | 1 + site/examples.html | 1 + site/faq.html | 1 + site/graphx/index.html | 1 + site/history.html | 235 +++ site/improvement-proposals.html | 1 + site/index.html | 17 +- site/mailing-lists.html | 1 + site/mllib/index.html | 1 + site/news/amp-camp-2013-registration-ope.html | 1 + .../news/announcing-the-first-spark-summit.html | 1 + .../news/fourth-spark-screencast-published.html | 1 + site/news/index.html| 1 + site/news/nsdi-paper.html | 1 + site/news/one-month-to-spark-summit-2015.html | 1 + .../proposals-open-for-spark-summit-east.html | 1 + ...registration-open-for-spark-summit-east.html | 1 + .../news/run-spark-and-shark-on-amazon-emr.html | 1 + site/news/spark-0-6-1-and-0-5-2-released.html | 1 + site/news/spark-0-6-2-released.html | 1 + site/news/spark-0-7-0-released.html | 1 + site/news/spark-0-7-2-released.html | 1 + site/news/spark-0-7-3-released.html | 1 + site/news/spark-0-8-0-released.html | 1 + site/news/spark-0-8-1-released.html | 1 + site/news/spark-0-9-0-released.html | 1 + site/news/spark-0-9-1-released.html | 1 + site/news/spark-0-9-2-released.html | 1 + site/news/spark-1-0-0-released.html | 1 + site/news/spark-1-0-1-released.html | 1 + site/news/spark-1-0-2-released.html | 1 + site/news/spark-1-1-0-released.html | 1 + site/news/spark-1-1-1-released.html | 1 + site/news/spark-1-2-0-released.html | 1 + site/news/spark-1-2-1-released.html | 1 + site/news/spark-1-2-2-released.html | 1 + site/news/spark-1-3-0-released.html | 1 + site/news/spark-1-4-0-released.html | 1 + site/news/spark-1-4-1-released.html | 1 + site/news/spark-1-5-0-released.html | 1 + site/news/spark-1-5-1-released.html | 1 + site/news/spark-1-5-2-released.html | 1 + site/news/spark-1-6-0-released.html | 1 + site/news/spark-1-6-1-released.html | 1 + site/news/spark-1-6-2-released.html | 1 + site/news/spark-1-6-3-released.html | 1 + site/news/spark-2-0-0-released.html | 1 + site/news/spark-2-0-1-released.html | 1 + site/news/spark-2-0-2-released.html | 1 + site/news/spark-2-1-0-released.html | 1 + site/news/spark-2-1-1-released.html | 1 + site/news/spark-2-1-2-released.html | 1 + site/news/spark-2-2-0-released.html | 1 + site/news/spark-2-2-1-released.html | 1 + site/news/spark-2-3-0-released.html | 1 + site/news/spark-2.0.0-preview.html | 1 + .../spark-accepted-into-apache-incubator.html | 1 + site/news/spark-and-shark-in-the-news.html | 1 + site/news/spark-becomes-tlp.html| 1 +
[1/2] spark-website git commit: Squashed commit of the following:
Repository: spark-website Updated Branches: refs/heads/asf-site 8bd24fb6d -> a1d84bcbf http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-summit-june-2016-agenda-posted.html -- diff --git a/site/news/spark-summit-june-2016-agenda-posted.html b/site/news/spark-summit-june-2016-agenda-posted.html index ce68829..7947354 100644 --- a/site/news/spark-summit-june-2016-agenda-posted.html +++ b/site/news/spark-summit-june-2016-agenda-posted.html @@ -123,6 +123,7 @@ https://issues.apache.org/jira/browse/SPARK;>Issue Tracker Powered By Project Committers + Project History http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-summit-june-2017-agenda-posted.html -- diff --git a/site/news/spark-summit-june-2017-agenda-posted.html b/site/news/spark-summit-june-2017-agenda-posted.html index 5d2df4b..e4055c3 100644 --- a/site/news/spark-summit-june-2017-agenda-posted.html +++ b/site/news/spark-summit-june-2017-agenda-posted.html @@ -123,6 +123,7 @@ https://issues.apache.org/jira/browse/SPARK;>Issue Tracker Powered By Project Committers + Project History http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-summit-june-2018-agenda-posted.html -- diff --git a/site/news/spark-summit-june-2018-agenda-posted.html b/site/news/spark-summit-june-2018-agenda-posted.html index 17c284f..9b2f739 100644 --- a/site/news/spark-summit-june-2018-agenda-posted.html +++ b/site/news/spark-summit-june-2018-agenda-posted.html @@ -123,6 +123,7 @@ https://issues.apache.org/jira/browse/SPARK;>Issue Tracker Powered By Project Committers + Project History http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-tips-from-quantifind.html -- diff --git a/site/news/spark-tips-from-quantifind.html b/site/news/spark-tips-from-quantifind.html index bfbac1d..00c71c2 100644 --- a/site/news/spark-tips-from-quantifind.html +++ b/site/news/spark-tips-from-quantifind.html @@ -123,6 +123,7 @@ https://issues.apache.org/jira/browse/SPARK;>Issue Tracker Powered By Project Committers + Project History http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-user-survey-and-powered-by-page.html -- diff --git a/site/news/spark-user-survey-and-powered-by-page.html b/site/news/spark-user-survey-and-powered-by-page.html index 67935a9..c015e5c 100644 --- a/site/news/spark-user-survey-and-powered-by-page.html +++ b/site/news/spark-user-survey-and-powered-by-page.html @@ -123,6 +123,7 @@ https://issues.apache.org/jira/browse/SPARK;>Issue Tracker Powered By Project Committers + Project History http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-version-0-6-0-released.html -- diff --git a/site/news/spark-version-0-6-0-released.html b/site/news/spark-version-0-6-0-released.html index 3f670d7..d9120b0 100644 --- a/site/news/spark-version-0-6-0-released.html +++ b/site/news/spark-version-0-6-0-released.html @@ -123,6 +123,7 @@ https://issues.apache.org/jira/browse/SPARK;>Issue Tracker Powered By Project Committers + Project History http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-wins-cloudsort-100tb-benchmark.html -- diff --git a/site/news/spark-wins-cloudsort-100tb-benchmark.html b/site/news/spark-wins-cloudsort-100tb-benchmark.html index b498034..8bef605 100644 --- a/site/news/spark-wins-cloudsort-100tb-benchmark.html +++ b/site/news/spark-wins-cloudsort-100tb-benchmark.html @@ -123,6 +123,7 @@ https://issues.apache.org/jira/browse/SPARK;>Issue Tracker Powered By Project Committers + Project History http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-wins-daytona-gray-sort-100tb-benchmark.html -- diff --git a/site/news/spark-wins-daytona-gray-sort-100tb-benchmark.html b/site/news/spark-wins-daytona-gray-sort-100tb-benchmark.html index 18646f4..32f53e9 100644 ---
spark git commit: [SPARK-18371][STREAMING] Spark Streaming backpressure generates batch with large number of records
Repository: spark Updated Branches: refs/heads/master 5414abca4 -> dffeac369 [SPARK-18371][STREAMING] Spark Streaming backpressure generates batch with large number of records ## What changes were proposed in this pull request? Omit rounding of backpressure rate. Effects: - no batch with large number of records is created when rate from PID estimator is one - the number of records per batch and partition is more fine-grained improving backpressure accuracy ## How was this patch tested? This was tested by running: - `mvn test -pl external/kafka-0-8` - `mvn test -pl external/kafka-0-10` - a streaming application which was suffering from the issue JasonMWhite The contribution is my original work and I license the work to the project under the projectâs open source license Author: Sebastian ArztCloses #17774 from arzt/kafka-back-pressure. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dffeac36 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dffeac36 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dffeac36 Branch: refs/heads/master Commit: dffeac3691daa620446ae949c5b147518d128e08 Parents: 5414abc Author: Sebastian Arzt Authored: Fri Mar 16 12:25:58 2018 -0500 Committer: cody koeninger Committed: Fri Mar 16 12:25:58 2018 -0500 -- .../kafka010/DirectKafkaInputDStream.scala | 6 +-- .../kafka010/DirectKafkaStreamSuite.scala | 48 ++ .../kafka/DirectKafkaInputDStream.scala | 6 +-- .../kafka/DirectKafkaStreamSuite.scala | 51 4 files changed, 105 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/dffeac36/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala -- diff --git a/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala b/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala index 0fa3287..9cb2448 100644 --- a/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala +++ b/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala @@ -138,17 +138,17 @@ private[spark] class DirectKafkaInputDStream[K, V]( lagPerPartition.map { case (tp, lag) => val maxRateLimitPerPartition = ppc.maxRatePerPartition(tp) - val backpressureRate = Math.round(lag / totalLag.toFloat * rate) + val backpressureRate = lag / totalLag.toDouble * rate tp -> (if (maxRateLimitPerPartition > 0) { Math.min(backpressureRate, maxRateLimitPerPartition)} else backpressureRate) } - case None => offsets.map { case (tp, offset) => tp -> ppc.maxRatePerPartition(tp) } + case None => offsets.map { case (tp, offset) => tp -> ppc.maxRatePerPartition(tp).toDouble } } if (effectiveRateLimitPerPartition.values.sum > 0) { val secsPerBatch = context.graph.batchDuration.milliseconds.toDouble / 1000 Some(effectiveRateLimitPerPartition.map { -case (tp, limit) => tp -> (secsPerBatch * limit).toLong +case (tp, limit) => tp -> Math.max((secsPerBatch * limit).toLong, 1L) }) } else { None http://git-wip-us.apache.org/repos/asf/spark/blob/dffeac36/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/DirectKafkaStreamSuite.scala -- diff --git a/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/DirectKafkaStreamSuite.scala b/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/DirectKafkaStreamSuite.scala index 453b5e5..8524743 100644 --- a/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/DirectKafkaStreamSuite.scala +++ b/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/DirectKafkaStreamSuite.scala @@ -617,6 +617,54 @@ class DirectKafkaStreamSuite ssc.stop() } + test("maxMessagesPerPartition with zero offset and rate equal to one") { +val topic = "backpressure" +val kafkaParams = getKafkaParams() +val batchIntervalMilliseconds = 6 +val sparkConf = new SparkConf() + // Safe, even with streaming, because we're using the direct API. + // Using 1 core is useful to make the test more predictable. + .setMaster("local[1]") + .setAppName(this.getClass.getSimpleName) + .set("spark.streaming.kafka.maxRatePerPartition", "100") + +// Setup the streaming context +ssc = new
svn commit: r25774 - in /dev/spark/2.3.1-SNAPSHOT-2018_03_16_10_01-21b6de4-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Fri Mar 16 17:15:31 2018 New Revision: 25774 Log: Apache Spark 2.3.1-SNAPSHOT-2018_03_16_10_01-21b6de4 docs [This commit notification would consist of 1443 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-23553][TESTS] Tests should not assume the default value of `spark.sql.sources.default`
Repository: spark Updated Branches: refs/heads/branch-2.3 d9e1f7040 -> 21b6de459 [SPARK-23553][TESTS] Tests should not assume the default value of `spark.sql.sources.default` ## What changes were proposed in this pull request? Currently, some tests have an assumption that `spark.sql.sources.default=parquet`. In fact, that is a correct assumption, but that assumption makes it difficult to test new data source format. This PR aims to - Improve test suites more robust and makes it easy to test new data sources in the future. - Test new native ORC data source with the full existing Apache Spark test coverage. As an example, the PR uses `spark.sql.sources.default=orc` during reviews. The value should be `parquet` when this PR is accepted. ## How was this patch tested? Pass the Jenkins with updated tests. Author: Dongjoon HyunCloses #20705 from dongjoon-hyun/SPARK-23553. (cherry picked from commit 5414abca4fec6a68174c34d22d071c20027e959d) Signed-off-by: gatorsmile Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/21b6de45 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/21b6de45 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/21b6de45 Branch: refs/heads/branch-2.3 Commit: 21b6de4592bf8c69db9135f409912b6a62f70894 Parents: d9e1f70 Author: Dongjoon Hyun Authored: Fri Mar 16 09:36:30 2018 -0700 Committer: gatorsmile Committed: Fri Mar 16 09:36:46 2018 -0700 -- python/pyspark/sql/readwriter.py| 4 +- .../org/apache/spark/sql/SQLQuerySuite.scala| 9 +-- .../columnar/InMemoryColumnarQuerySuite.scala | 5 +- .../spark/sql/execution/command/DDLSuite.scala | 11 ++- .../ParquetPartitionDiscoverySuite.scala| 10 +++ .../sql/test/DataFrameReaderWriterSuite.scala | 3 +- .../sql/hive/MetastoreDataSourcesSuite.scala| 72 +--- .../PartitionProviderCompatibilitySuite.scala | 6 +- .../hive/PartitionedTablePerfStatsSuite.scala | 2 +- .../spark/sql/hive/execution/HiveDDLSuite.scala | 11 +-- .../sql/hive/execution/SQLQuerySuite.scala | 19 ++ 11 files changed, 81 insertions(+), 71 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/21b6de45/python/pyspark/sql/readwriter.py -- diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py index 9d05ac7..e70aa9e 100644 --- a/python/pyspark/sql/readwriter.py +++ b/python/pyspark/sql/readwriter.py @@ -147,8 +147,8 @@ class DataFrameReader(OptionUtils): or a DDL-formatted string (For example ``col0 INT, col1 DOUBLE``). :param options: all other string options ->>> df = spark.read.load('python/test_support/sql/parquet_partitioned', opt1=True, -... opt2=1, opt3='str') +>>> df = spark.read.format("parquet").load('python/test_support/sql/parquet_partitioned', +... opt1=True, opt2=1, opt3='str') >>> df.dtypes [('name', 'string'), ('year', 'int'), ('month', 'int'), ('day', 'int')] http://git-wip-us.apache.org/repos/asf/spark/blob/21b6de45/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index 97d03b2..ebebf62 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -2148,7 +2148,8 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { test("data source table created in InMemoryCatalog should be able to read/write") { withTable("tbl") { - sql("CREATE TABLE tbl(i INT, j STRING) USING parquet") + val provider = spark.sessionState.conf.defaultDataSourceName + sql(s"CREATE TABLE tbl(i INT, j STRING) USING $provider") checkAnswer(sql("SELECT i, j FROM tbl"), Nil) Seq(1 -> "a", 2 -> "b").toDF("i", "j").write.mode("overwrite").insertInto("tbl") @@ -2472,9 +2473,9 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { test("SPARK-16975: Column-partition path starting '_' should be handled correctly") { withTempDir { dir => - val parquetDir = new File(dir, "parquet").getCanonicalPath - spark.range(10).withColumn("_col", $"id").write.partitionBy("_col").save(parquetDir) - spark.read.parquet(parquetDir) + val dataDir = new File(dir, "data").getCanonicalPath + spark.range(10).withColumn("_col", $"id").write.partitionBy("_col").save(dataDir) + spark.read.load(dataDir) }
spark git commit: [SPARK-23553][TESTS] Tests should not assume the default value of `spark.sql.sources.default`
Repository: spark Updated Branches: refs/heads/master c95200048 -> 5414abca4 [SPARK-23553][TESTS] Tests should not assume the default value of `spark.sql.sources.default` ## What changes were proposed in this pull request? Currently, some tests have an assumption that `spark.sql.sources.default=parquet`. In fact, that is a correct assumption, but that assumption makes it difficult to test new data source format. This PR aims to - Improve test suites more robust and makes it easy to test new data sources in the future. - Test new native ORC data source with the full existing Apache Spark test coverage. As an example, the PR uses `spark.sql.sources.default=orc` during reviews. The value should be `parquet` when this PR is accepted. ## How was this patch tested? Pass the Jenkins with updated tests. Author: Dongjoon HyunCloses #20705 from dongjoon-hyun/SPARK-23553. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5414abca Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5414abca Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5414abca Branch: refs/heads/master Commit: 5414abca4fec6a68174c34d22d071c20027e959d Parents: c952000 Author: Dongjoon Hyun Authored: Fri Mar 16 09:36:30 2018 -0700 Committer: gatorsmile Committed: Fri Mar 16 09:36:30 2018 -0700 -- python/pyspark/sql/readwriter.py| 4 +- .../org/apache/spark/sql/SQLQuerySuite.scala| 9 +-- .../columnar/InMemoryColumnarQuerySuite.scala | 5 +- .../spark/sql/execution/command/DDLSuite.scala | 11 ++- .../ParquetPartitionDiscoverySuite.scala| 10 +++ .../sql/test/DataFrameReaderWriterSuite.scala | 3 +- .../sql/hive/MetastoreDataSourcesSuite.scala| 72 +--- .../PartitionProviderCompatibilitySuite.scala | 6 +- .../hive/PartitionedTablePerfStatsSuite.scala | 2 +- .../spark/sql/hive/execution/HiveDDLSuite.scala | 11 +-- .../sql/hive/execution/SQLQuerySuite.scala | 19 ++ 11 files changed, 81 insertions(+), 71 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5414abca/python/pyspark/sql/readwriter.py -- diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py index 803f561..facc16b 100644 --- a/python/pyspark/sql/readwriter.py +++ b/python/pyspark/sql/readwriter.py @@ -147,8 +147,8 @@ class DataFrameReader(OptionUtils): or a DDL-formatted string (For example ``col0 INT, col1 DOUBLE``). :param options: all other string options ->>> df = spark.read.load('python/test_support/sql/parquet_partitioned', opt1=True, -... opt2=1, opt3='str') +>>> df = spark.read.format("parquet").load('python/test_support/sql/parquet_partitioned', +... opt1=True, opt2=1, opt3='str') >>> df.dtypes [('name', 'string'), ('year', 'int'), ('month', 'int'), ('day', 'int')] http://git-wip-us.apache.org/repos/asf/spark/blob/5414abca/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index 8f14575..640affc 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -2150,7 +2150,8 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { test("data source table created in InMemoryCatalog should be able to read/write") { withTable("tbl") { - sql("CREATE TABLE tbl(i INT, j STRING) USING parquet") + val provider = spark.sessionState.conf.defaultDataSourceName + sql(s"CREATE TABLE tbl(i INT, j STRING) USING $provider") checkAnswer(sql("SELECT i, j FROM tbl"), Nil) Seq(1 -> "a", 2 -> "b").toDF("i", "j").write.mode("overwrite").insertInto("tbl") @@ -2474,9 +2475,9 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { test("SPARK-16975: Column-partition path starting '_' should be handled correctly") { withTempDir { dir => - val parquetDir = new File(dir, "parquet").getCanonicalPath - spark.range(10).withColumn("_col", $"id").write.partitionBy("_col").save(parquetDir) - spark.read.parquet(parquetDir) + val dataDir = new File(dir, "data").getCanonicalPath + spark.range(10).withColumn("_col", $"id").write.partitionBy("_col").save(dataDir) + spark.read.load(dataDir) } }
svn commit: r25762 - in /dev/spark/2.4.0-SNAPSHOT-2018_03_16_04_01-c952000-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Fri Mar 16 11:20:56 2018 New Revision: 25762 Log: Apache Spark 2.4.0-SNAPSHOT-2018_03_16_04_01-c952000 docs [This commit notification would consist of 1448 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-23635][YARN] AM env variable should not overwrite same name env variable set through spark.executorEnv.
Repository: spark Updated Branches: refs/heads/master ca83526de -> c95200048 [SPARK-23635][YARN] AM env variable should not overwrite same name env variable set through spark.executorEnv. ## What changes were proposed in this pull request? In the current Spark on YARN code, AM always will copy and overwrite its env variables to executors, so we cannot set different values for executors. To reproduce issue, user could start spark-shell like: ``` ./bin/spark-shell --master yarn-client --conf spark.executorEnv.SPARK_ABC=executor_val --conf spark.yarn.appMasterEnv.SPARK_ABC=am_val ``` Then check executor env variables by ``` sc.parallelize(1 to 1).flatMap \{ i => sys.env.toSeq }.collect.foreach(println) ``` We will always get `am_val` instead of `executor_val`. So we should not let AM to overwrite specifically set executor env variables. ## How was this patch tested? Added UT and tested in local cluster. Author: jerryshaoCloses #20799 from jerryshao/SPARK-23635. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c9520004 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c9520004 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c9520004 Branch: refs/heads/master Commit: c952000487ee003200221b3c4e25dcb06e359f0a Parents: ca83526 Author: jerryshao Authored: Fri Mar 16 16:22:03 2018 +0800 Committer: jerryshao Committed: Fri Mar 16 16:22:03 2018 +0800 -- .../spark/deploy/yarn/ExecutorRunnable.scala| 22 +++- .../spark/deploy/yarn/YarnClusterSuite.scala| 36 2 files changed, 50 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c9520004/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala -- diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala index 3f4d236..ab08698 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala @@ -220,12 +220,6 @@ private[yarn] class ExecutorRunnable( val env = new HashMap[String, String]() Client.populateClasspath(null, conf, sparkConf, env, sparkConf.get(EXECUTOR_CLASS_PATH)) -sparkConf.getExecutorEnv.foreach { case (key, value) => - // This assumes each executor environment variable set here is a path - // This is kept for backward compatibility and consistency with hadoop - YarnSparkHadoopUtil.addPathToEnvironment(env, key, value) -} - // lookup appropriate http scheme for container log urls val yarnHttpPolicy = conf.get( YarnConfiguration.YARN_HTTP_POLICY_KEY, @@ -233,6 +227,20 @@ private[yarn] class ExecutorRunnable( ) val httpScheme = if (yarnHttpPolicy == "HTTPS_ONLY") "https://; else "http://; +System.getenv().asScala.filterKeys(_.startsWith("SPARK")) + .foreach { case (k, v) => env(k) = v } + +sparkConf.getExecutorEnv.foreach { case (key, value) => + if (key == Environment.CLASSPATH.name()) { +// If the key of env variable is CLASSPATH, we assume it is a path and append it. +// This is kept for backward compatibility and consistency with hadoop +YarnSparkHadoopUtil.addPathToEnvironment(env, key, value) + } else { +// For other env variables, simply overwrite the value. +env(key) = value + } +} + // Add log urls container.foreach { c => sys.env.get("SPARK_USER").foreach { user => @@ -245,8 +253,6 @@ private[yarn] class ExecutorRunnable( } } -System.getenv().asScala.filterKeys(_.startsWith("SPARK")) - .foreach { case (k, v) => env(k) = v } env } } http://git-wip-us.apache.org/repos/asf/spark/blob/c9520004/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala -- diff --git a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala index 33d400a..a129be7 100644 --- a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala +++ b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala @@ -225,6 +225,14 @@ class YarnClusterSuite extends BaseYarnClusterSuite { finalState should be (SparkAppHandle.State.FAILED) } +
spark git commit: [SPARK-23644][CORE][UI] Use absolute path for REST call in SHS
Repository: spark Updated Branches: refs/heads/master c2632edeb -> ca83526de [SPARK-23644][CORE][UI] Use absolute path for REST call in SHS ## What changes were proposed in this pull request? SHS is using a relative path for the REST API call to get the list of the application is a relative path call. In case of the SHS being consumed through a proxy, it can be an issue if the path doesn't end with a "/". Therefore, we should use an absolute path for the REST call as it is done for all the other resources. ## How was this patch tested? manual tests Before the change: ![screen shot 2018-03-10 at 4 22 02 pm](https://user-images.githubusercontent.com/8821783/37244190-8ccf9d40-2485-11e8-8fa9-345bc81472fc.png) After the change: ![screen shot 2018-03-10 at 4 36 34 pm 1](https://user-images.githubusercontent.com/8821783/37244201-a1922810-2485-11e8-8856-eeab2bf5e180.png) Author: Marco GaidoCloses #20794 from mgaido91/SPARK-23644. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ca83526d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ca83526d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ca83526d Branch: refs/heads/master Commit: ca83526de55f0f8784df58cc8b7c0a7cb0c96e23 Parents: c2632ed Author: Marco Gaido Authored: Fri Mar 16 15:12:26 2018 +0800 Committer: jerryshao Committed: Fri Mar 16 15:12:26 2018 +0800 -- .../src/main/resources/org/apache/spark/ui/static/historypage.js | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ca83526d/core/src/main/resources/org/apache/spark/ui/static/historypage.js -- diff --git a/core/src/main/resources/org/apache/spark/ui/static/historypage.js b/core/src/main/resources/org/apache/spark/ui/static/historypage.js index f0b2a5a..abc2ec0 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/historypage.js +++ b/core/src/main/resources/org/apache/spark/ui/static/historypage.js @@ -113,7 +113,7 @@ $(document).ready(function() { status: (requestedIncomplete ? "running" : "completed") }; -$.getJSON("api/v1/applications", appParams, function(response,status,jqXHR) { +$.getJSON(uiRoot + "/api/v1/applications", appParams, function(response,status,jqXHR) { var array = []; var hasMultipleAttempts = false; for (i in response) { @@ -151,7 +151,7 @@ $(document).ready(function() { "showCompletedColumns": !requestedIncomplete, } - $.get("static/historypage-template.html", function(template) { + $.get(uiRoot + "/static/historypage-template.html", function(template) { var sibling = historySummary.prev(); historySummary.detach(); var apps = $(Mustache.render($(template).filter("#history-summary-template").html(),data)); - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org