svn commit: r31018 - in /dev/spark/3.0.0-SNAPSHOT-2018_11_20_20_57-4b7f7ef-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Nov 21 05:10:40 2018 New Revision: 31018 Log: Apache Spark 3.0.0-SNAPSHOT-2018_11_20_20_57-4b7f7ef docs [This commit notification would consist of 1755 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r31015 - in /dev/spark/2.4.1-SNAPSHOT-2018_11_20_18_56-d8e05d2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Nov 21 03:10:31 2018 New Revision: 31015 Log: Apache Spark 2.4.1-SNAPSHOT-2018_11_20_18_56-d8e05d2 docs [This commit notification would consist of 1476 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-26120][TESTS][SS][SPARKR] Fix a streaming query leak in Structured Streaming R tests
Repository: spark Updated Branches: refs/heads/branch-2.4 3bb9fff68 -> d8e05d23a [SPARK-26120][TESTS][SS][SPARKR] Fix a streaming query leak in Structured Streaming R tests ## What changes were proposed in this pull request? Stop the streaming query in `Specify a schema by using a DDL-formatted string when reading` to avoid outputting annoying logs. ## How was this patch tested? Jenkins Closes #23089 from zsxwing/SPARK-26120. Authored-by: Shixiong Zhu Signed-off-by: hyukjinkwon (cherry picked from commit 4b7f7ef5007c2c8a5090f22c6e08927e9f9a407b) Signed-off-by: hyukjinkwon Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d8e05d23 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d8e05d23 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d8e05d23 Branch: refs/heads/branch-2.4 Commit: d8e05d23a046eee559b0c71bcfba5b9809c3d9eb Parents: 3bb9fff Author: Shixiong Zhu Authored: Wed Nov 21 09:31:12 2018 +0800 Committer: hyukjinkwon Committed: Wed Nov 21 09:31:34 2018 +0800 -- R/pkg/tests/fulltests/test_streaming.R | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d8e05d23/R/pkg/tests/fulltests/test_streaming.R -- diff --git a/R/pkg/tests/fulltests/test_streaming.R b/R/pkg/tests/fulltests/test_streaming.R index bfb1a04..6f0d2ae 100644 --- a/R/pkg/tests/fulltests/test_streaming.R +++ b/R/pkg/tests/fulltests/test_streaming.R @@ -127,6 +127,7 @@ test_that("Specify a schema by using a DDL-formatted string when reading", { expect_false(awaitTermination(q, 5 * 1000)) callJMethod(q@ssq, "processAllAvailable") expect_equal(head(sql("SELECT count(*) FROM people3"))[[1]], 3) + stopQuery(q) expect_error(read.stream(path = parquetPath, schema = "name stri"), "DataType stri is not supported.") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-26120][TESTS][SS][SPARKR] Fix a streaming query leak in Structured Streaming R tests
Repository: spark Updated Branches: refs/heads/master 2df34db58 -> 4b7f7ef50 [SPARK-26120][TESTS][SS][SPARKR] Fix a streaming query leak in Structured Streaming R tests ## What changes were proposed in this pull request? Stop the streaming query in `Specify a schema by using a DDL-formatted string when reading` to avoid outputting annoying logs. ## How was this patch tested? Jenkins Closes #23089 from zsxwing/SPARK-26120. Authored-by: Shixiong Zhu Signed-off-by: hyukjinkwon Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4b7f7ef5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4b7f7ef5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4b7f7ef5 Branch: refs/heads/master Commit: 4b7f7ef5007c2c8a5090f22c6e08927e9f9a407b Parents: 2df34db Author: Shixiong Zhu Authored: Wed Nov 21 09:31:12 2018 +0800 Committer: hyukjinkwon Committed: Wed Nov 21 09:31:12 2018 +0800 -- R/pkg/tests/fulltests/test_streaming.R | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4b7f7ef5/R/pkg/tests/fulltests/test_streaming.R -- diff --git a/R/pkg/tests/fulltests/test_streaming.R b/R/pkg/tests/fulltests/test_streaming.R index bfb1a04..6f0d2ae 100644 --- a/R/pkg/tests/fulltests/test_streaming.R +++ b/R/pkg/tests/fulltests/test_streaming.R @@ -127,6 +127,7 @@ test_that("Specify a schema by using a DDL-formatted string when reading", { expect_false(awaitTermination(q, 5 * 1000)) callJMethod(q@ssq, "processAllAvailable") expect_equal(head(sql("SELECT count(*) FROM people3"))[[1]], 3) + stopQuery(q) expect_error(read.stream(path = parquetPath, schema = "name stri"), "DataType stri is not supported.") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-26122][SQL] Support encoding for multiLine in CSV datasource
Repository: spark Updated Branches: refs/heads/master 47851056c -> 2df34db58 [SPARK-26122][SQL] Support encoding for multiLine in CSV datasource ## What changes were proposed in this pull request? In the PR, I propose to pass the CSV option `encoding`/`charset` to `uniVocity` parser to allow parsing CSV files in different encodings when `multiLine` is enabled. The value of the option is passed to the `beginParsing` method of `CSVParser`. ## How was this patch tested? Added new test to `CSVSuite` for different encodings and enabled/disabled header. Closes #23091 from MaxGekk/csv-miltiline-encoding. Authored-by: Maxim Gekk Signed-off-by: hyukjinkwon Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2df34db5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2df34db5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2df34db5 Branch: refs/heads/master Commit: 2df34db586bec379e40b5cf30021f5b7a2d79271 Parents: 4785105 Author: Maxim Gekk Authored: Wed Nov 21 09:29:22 2018 +0800 Committer: hyukjinkwon Committed: Wed Nov 21 09:29:22 2018 +0800 -- .../sql/catalyst/csv/UnivocityParser.scala | 12 ++- .../datasources/csv/CSVDataSource.scala | 6 -- .../execution/datasources/csv/CSVSuite.scala| 21 3 files changed, 32 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2df34db5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala index 46ed58e..ed19693 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala @@ -271,11 +271,12 @@ private[sql] object UnivocityParser { def tokenizeStream( inputStream: InputStream, shouldDropHeader: Boolean, - tokenizer: CsvParser): Iterator[Array[String]] = { + tokenizer: CsvParser, + encoding: String): Iterator[Array[String]] = { val handleHeader: () => Unit = () => if (shouldDropHeader) tokenizer.parseNext -convertStream(inputStream, tokenizer, handleHeader)(tokens => tokens) +convertStream(inputStream, tokenizer, handleHeader, encoding)(tokens => tokens) } /** @@ -297,7 +298,7 @@ private[sql] object UnivocityParser { val handleHeader: () => Unit = () => headerChecker.checkHeaderColumnNames(tokenizer) -convertStream(inputStream, tokenizer, handleHeader) { tokens => +convertStream(inputStream, tokenizer, handleHeader, parser.options.charset) { tokens => safeParser.parse(tokens) }.flatten } @@ -305,9 +306,10 @@ private[sql] object UnivocityParser { private def convertStream[T]( inputStream: InputStream, tokenizer: CsvParser, - handleHeader: () => Unit)( + handleHeader: () => Unit, + encoding: String)( convert: Array[String] => T) = new Iterator[T] { -tokenizer.beginParsing(inputStream) +tokenizer.beginParsing(inputStream, encoding) // We can handle header here since here the stream is open. handleHeader() http://git-wip-us.apache.org/repos/asf/spark/blob/2df34db5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala index 4808e8e..554baaf 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala @@ -192,7 +192,8 @@ object MultiLineCSVDataSource extends CSVDataSource { UnivocityParser.tokenizeStream( CodecStreams.createInputStreamWithCloseResource(lines.getConfiguration, path), shouldDropHeader = false, -new CsvParser(parsedOptions.asParserSettings)) +new CsvParser(parsedOptions.asParserSettings), +encoding = parsedOptions.charset) }.take(1).headOption match { case Some(firstRow) => val caseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis @@ -203,7 +204,8 @@ object MultiLineCSVDataSource extends CSVDataSource { lines.getConfiguration, new Path(lines.getPath())), parsedOptions.headerFlag, -new
svn commit: r31014 - in /dev/spark/3.0.0-SNAPSHOT-2018_11_20_16_52-4785105-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Nov 21 01:05:38 2018 New Revision: 31014 Log: Apache Spark 3.0.0-SNAPSHOT-2018_11_20_16_52-4785105 docs [This commit notification would consist of 1755 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-26124][BUILD] Update plugins to latest versions
Repository: spark Updated Branches: refs/heads/master 23bcd6ce4 -> 47851056c [SPARK-26124][BUILD] Update plugins to latest versions ## What changes were proposed in this pull request? Update many plugins we use to the latest version, especially MiMa, which entails excluding some new errors on old changes. ## How was this patch tested? N/A Closes #23087 from srowen/Plugins. Authored-by: Sean Owen Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/47851056 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/47851056 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/47851056 Branch: refs/heads/master Commit: 47851056c20c5d981b1ca66bac3f00c19a882727 Parents: 23bcd6c Author: Sean Owen Authored: Tue Nov 20 18:05:39 2018 -0600 Committer: Sean Owen Committed: Tue Nov 20 18:05:39 2018 -0600 -- pom.xml| 40 project/MimaExcludes.scala | 10 +- project/plugins.sbt| 14 +++--- 3 files changed, 40 insertions(+), 24 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/47851056/pom.xml -- diff --git a/pom.xml b/pom.xml index 9130773..08a29d2 100644 --- a/pom.xml +++ b/pom.xml @@ -1977,7 +1977,7 @@ org.apache.maven.plugins maven-enforcer-plugin - 3.0.0-M1 + 3.0.0-M2 enforce-versions @@ -2077,7 +2077,7 @@ org.apache.maven.plugins maven-compiler-plugin - 3.7.0 + 3.8.0 ${java.version} ${java.version} @@ -2094,7 +2094,7 @@ org.apache.maven.plugins maven-surefire-plugin - 2.22.0 + 3.0.0-M1 @@ -2148,7 +2148,7 @@ org.scalatest scalatest-maven-plugin - 1.0 + 2.0.0 ${project.build.directory}/surefire-reports @@ -2195,7 +2195,7 @@ org.apache.maven.plugins maven-jar-plugin - 3.0.2 + 3.1.0 org.apache.maven.plugins @@ -,7 +,7 @@ org.apache.maven.plugins maven-clean-plugin - 3.0.0 + 3.1.0 @@ -2240,9 +2240,12 @@ org.apache.maven.plugins maven-javadoc-plugin - 3.0.0-M1 + 3.0.1 --Xdoclint:all -Xdoclint:-missing + + -Xdoclint:all + -Xdoclint:-missing + example @@ -2293,7 +2296,7 @@ org.apache.maven.plugins maven-shade-plugin - 3.2.0 + 3.2.1 org.ow2.asm @@ -2310,12 +2313,12 @@ org.apache.maven.plugins maven-install-plugin - 2.5.2 + 3.0.0-M1 org.apache.maven.plugins maven-deploy-plugin - 2.8.2 + 3.0.0-M1 org.apache.maven.plugins @@ -2361,7 +2364,7 @@ org.apache.maven.plugins maven-jar-plugin -[2.6,) +3.1.0 test-jar @@ -2518,12 +2521,17 @@ org.apache.maven.plugins maven-checkstyle-plugin -2.17 +3.0.0 false true - ${basedir}/src/main/java,${basedir}/src/main/scala - ${basedir}/src/test/java + +${basedir}/src/main/java +${basedir}/src/main/scala + + +${basedir}/src/test/java + dev/checkstyle.xml ${basedir}/target/checkstyle-output.xml ${project.build.sourceEncoding} @@ -2533,7 +2541,7 @@ com.puppycrawl.tools checkstyle -8.2 +8.14 http://git-wip-us.apache.org/repos/asf/spark/blob/47851056/project/MimaExcludes.scala -- diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala index e35e74a..b750535 100644 --- a/project/MimaExcludes.scala +++ b/project/MimaExcludes.scala @@ -36,7 +36,15 @@ object MimaExcludes { // Exclude rules for 3.0.x lazy val v30excludes = v24excludes ++ Seq( -// [SPARK-26090] Resolve most miscellaneous deprecation and build warnings for
spark git commit: [SPARK-26043][HOTFIX] Hotfix a change to SparkHadoopUtil that doesn't work in 2.11
Repository: spark Updated Branches: refs/heads/master 42c48387c -> 23bcd6ce4 [SPARK-26043][HOTFIX] Hotfix a change to SparkHadoopUtil that doesn't work in 2.11 ## What changes were proposed in this pull request? Hotfix a change to SparkHadoopUtil that doesn't work in 2.11 ## How was this patch tested? Existing tests. Closes #23097 from srowen/SPARK-26043.2. Authored-by: Sean Owen Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/23bcd6ce Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/23bcd6ce Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/23bcd6ce Branch: refs/heads/master Commit: 23bcd6ce458f1e49f307c89ca2794dc9a173077c Parents: 42c4838 Author: Sean Owen Authored: Tue Nov 20 18:03:54 2018 -0600 Committer: Sean Owen Committed: Tue Nov 20 18:03:54 2018 -0600 -- .../scala/org/apache/spark/deploy/SparkHadoopUtil.scala | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/23bcd6ce/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala index 217e514..7bb2a41 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala @@ -20,7 +20,7 @@ package org.apache.spark.deploy import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, DataOutputStream, File, IOException} import java.security.PrivilegedExceptionAction import java.text.DateFormat -import java.util.{Arrays, Date, Locale} +import java.util.{Arrays, Comparator, Date, Locale} import scala.collection.JavaConverters._ import scala.collection.immutable.Map @@ -269,10 +269,11 @@ private[spark] class SparkHadoopUtil extends Logging { name.startsWith(prefix) && !name.endsWith(exclusionSuffix) } }) - Arrays.sort(fileStatuses, -(o1: FileStatus, o2: FileStatus) => { + Arrays.sort(fileStatuses, new Comparator[FileStatus] { +override def compare(o1: FileStatus, o2: FileStatus): Int = { Longs.compare(o1.getModificationTime, o2.getModificationTime) -}) +} + }) fileStatuses } catch { case NonFatal(e) => - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r31013 - in /dev/spark/2.4.1-SNAPSHOT-2018_11_20_14_51-3bb9fff-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue Nov 20 23:07:56 2018 New Revision: 31013 Log: Apache Spark 2.4.1-SNAPSHOT-2018_11_20_14_51-3bb9fff docs [This commit notification would consist of 1476 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r31012 - in /dev/spark/2.3.3-SNAPSHOT-2018_11_20_14_51-0fb830c-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue Nov 20 23:06:40 2018 New Revision: 31012 Log: Apache Spark 2.3.3-SNAPSHOT-2018_11_20_14_51-0fb830c docs [This commit notification would consist of 1443 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r31009 - in /dev/spark/3.0.0-SNAPSHOT-2018_11_20_12_48-42c4838-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue Nov 20 21:00:38 2018 New Revision: 31009 Log: Apache Spark 3.0.0-SNAPSHOT-2018_11_20_12_48-42c4838 docs [This commit notification would consist of 1755 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [BUILD] refactor dev/lint-python in to something readable
Repository: spark Updated Branches: refs/heads/master db136d360 -> 42c48387c [BUILD] refactor dev/lint-python in to something readable ## What changes were proposed in this pull request? `dev/lint-python` is a mess of nearly unreadable bash. i would like to fix that as best as i can. ## How was this patch tested? the build system will test this. Closes #22994 from shaneknapp/lint-python-refactor. Authored-by: shane knapp Signed-off-by: shane knapp Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/42c48387 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/42c48387 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/42c48387 Branch: refs/heads/master Commit: 42c48387c047d96154bcfeb95fcb816a43e60d7c Parents: db136d3 Author: shane knapp Authored: Tue Nov 20 12:38:40 2018 -0800 Committer: shane knapp Committed: Tue Nov 20 12:38:40 2018 -0800 -- dev/lint-python | 359 +++ 1 file changed, 220 insertions(+), 139 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/42c48387/dev/lint-python -- diff --git a/dev/lint-python b/dev/lint-python index 27d87f6..0681693 100755 --- a/dev/lint-python +++ b/dev/lint-python @@ -1,5 +1,4 @@ #!/usr/bin/env bash - # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with @@ -16,160 +15,242 @@ # See the License for the specific language governing permissions and # limitations under the License. # +# define test binaries + versions +PYDOCSTYLE_BUILD="pydocstyle" +MINIMUM_PYDOCSTYLE="3.0.0" -SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )" -SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")" -# Exclude auto-generated configuration file. -PATHS_TO_CHECK="$( cd "$SPARK_ROOT_DIR" && find . -name "*.py" )" -DOC_PATHS_TO_CHECK="$( cd "$SPARK_ROOT_DIR" && find . -name "*.py" | grep -vF 'functions.py' )" -PYCODESTYLE_REPORT_PATH="$SPARK_ROOT_DIR/dev/pycodestyle-report.txt" -PYDOCSTYLE_REPORT_PATH="$SPARK_ROOT_DIR/dev/pydocstyle-report.txt" -PYLINT_REPORT_PATH="$SPARK_ROOT_DIR/dev/pylint-report.txt" -PYLINT_INSTALL_INFO="$SPARK_ROOT_DIR/dev/pylint-info.txt" - -PYDOCSTYLEBUILD="pydocstyle" -MINIMUM_PYDOCSTYLEVERSION="3.0.0" - -FLAKE8BUILD="flake8" +FLAKE8_BUILD="flake8" MINIMUM_FLAKE8="3.5.0" -SPHINXBUILD=${SPHINXBUILD:=sphinx-build} -SPHINX_REPORT_PATH="$SPARK_ROOT_DIR/dev/sphinx-report.txt" +PYCODESTYLE_BUILD="pycodestyle" +MINIMUM_PYCODESTYLE="2.4.0" -cd "$SPARK_ROOT_DIR" +SPHINX_BUILD="sphinx-build" -# compileall: https://docs.python.org/2/library/compileall.html -python -B -m compileall -q -l $PATHS_TO_CHECK > "$PYCODESTYLE_REPORT_PATH" -compile_status="${PIPESTATUS[0]}" +function compile_python_test { +local COMPILE_STATUS= +local COMPILE_REPORT= + +if [[ ! "$1" ]]; then +echo "No python files found! Something is very wrong -- exiting." +exit 1; +fi -# Get pycodestyle at runtime so that we don't rely on it being installed on the build server. -# See: https://github.com/apache/spark/pull/1744#issuecomment-50982162 -# Updated to the latest official version of pep8. pep8 is formally renamed to pycodestyle. -PYCODESTYLE_VERSION="2.4.0" -PYCODESTYLE_SCRIPT_PATH="$SPARK_ROOT_DIR/dev/pycodestyle-$PYCODESTYLE_VERSION.py" -PYCODESTYLE_SCRIPT_REMOTE_PATH="https://raw.githubusercontent.com/PyCQA/pycodestyle/$PYCODESTYLE_VERSION/pycodestyle.py; +# compileall: https://docs.python.org/2/library/compileall.html +echo "starting python compilation test..." +COMPILE_REPORT=$( (python -B -mcompileall -q -l $1) 2>&1) +COMPILE_STATUS=$? + +if [ $COMPILE_STATUS -ne 0 ]; then +echo "Python compilation failed with the following errors:" +echo "$COMPILE_REPORT" +echo "$COMPILE_STATUS" +exit "$COMPILE_STATUS" +else +echo "python compilation succeeded." +echo +fi +} -if [ ! -e "$PYCODESTYLE_SCRIPT_PATH" ]; then -curl --silent -o "$PYCODESTYLE_SCRIPT_PATH" "$PYCODESTYLE_SCRIPT_REMOTE_PATH" -curl_status="$?" +function pycodestyle_test { +local PYCODESTYLE_STATUS= +local PYCODESTYLE_REPORT= +local RUN_LOCAL_PYCODESTYLE= +local VERSION= +local EXPECTED_PYCODESTYLE= +local PYCODESTYLE_SCRIPT_PATH="$SPARK_ROOT_DIR/dev/pycodestyle-$MINIMUM_PYCODESTYLE.py" +local PYCODESTYLE_SCRIPT_REMOTE_PATH="https://raw.githubusercontent.com/PyCQA/pycodestyle/$MINIMUM_PYCODESTYLE/pycodestyle.py; -if [ "$curl_status" -ne 0 ]; then -echo "Failed to download pycodestyle.py from \"$PYCODESTYLE_SCRIPT_REMOTE_PATH\"." -exit "$curl_status" +if [[ ! "$1" ]]; then +echo "No python files found! Something is
spark git commit: [SPARK-26084][SQL] Fixes unresolved AggregateExpression.references exception
Repository: spark Updated Branches: refs/heads/branch-2.3 90e4dd1cb -> 0fb830c49 [SPARK-26084][SQL] Fixes unresolved AggregateExpression.references exception ## What changes were proposed in this pull request? This PR fixes an exception in `AggregateExpression.references` called on unresolved expressions. It implements the solution proposed in [SPARK-26084](https://issues.apache.org/jira/browse/SPARK-26084), a minor refactoring that removes the unnecessary dependence on `AttributeSet.toSeq`, which requires expression IDs and, therefore, can only execute successfully for resolved expressions. The refactored implementation is both simpler and faster, eliminating the conversion of a `Set` to a `Seq` and back to `Set`. ## How was this patch tested? Added a new test based on the failing case in [SPARK-26084](https://issues.apache.org/jira/browse/SPARK-26084). hvanhovell Closes #23075 from ssimeonov/ss_SPARK-26084. Authored-by: Simeon Simeonov Signed-off-by: Herman van Hovell (cherry picked from commit db136d360e54e13f1d7071a0428964a202cf7e31) Signed-off-by: Herman van Hovell Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0fb830c4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0fb830c4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0fb830c4 Branch: refs/heads/branch-2.3 Commit: 0fb830c49a09d292249496fc379d130e7097526e Parents: 90e4dd1 Author: Simeon Simeonov Authored: Tue Nov 20 21:29:56 2018 +0100 Committer: Herman van Hovell Committed: Tue Nov 20 21:31:39 2018 +0100 -- .../expressions/aggregate/interfaces.scala | 8 ++--- .../aggregate/AggregateExpressionSuite.scala| 34 2 files changed, 37 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0fb830c4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala index e1d16a2..56c2ee6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala @@ -128,12 +128,10 @@ case class AggregateExpression( override def nullable: Boolean = aggregateFunction.nullable override def references: AttributeSet = { -val childReferences = mode match { - case Partial | Complete => aggregateFunction.references.toSeq - case PartialMerge | Final => aggregateFunction.aggBufferAttributes +mode match { + case Partial | Complete => aggregateFunction.references + case PartialMerge | Final => AttributeSet(aggregateFunction.aggBufferAttributes) } - -AttributeSet(childReferences) } override def toString: String = { http://git-wip-us.apache.org/repos/asf/spark/blob/0fb830c4/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala new file mode 100644 index 000..8e9c997 --- /dev/null +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute +import org.apache.spark.sql.catalyst.expressions.{Add, AttributeSet} + +class AggregateExpressionSuite extends
spark git commit: [SPARK-26084][SQL] Fixes unresolved AggregateExpression.references exception
Repository: spark Updated Branches: refs/heads/branch-2.4 c28a27a25 -> 3bb9fff68 [SPARK-26084][SQL] Fixes unresolved AggregateExpression.references exception ## What changes were proposed in this pull request? This PR fixes an exception in `AggregateExpression.references` called on unresolved expressions. It implements the solution proposed in [SPARK-26084](https://issues.apache.org/jira/browse/SPARK-26084), a minor refactoring that removes the unnecessary dependence on `AttributeSet.toSeq`, which requires expression IDs and, therefore, can only execute successfully for resolved expressions. The refactored implementation is both simpler and faster, eliminating the conversion of a `Set` to a `Seq` and back to `Set`. ## How was this patch tested? Added a new test based on the failing case in [SPARK-26084](https://issues.apache.org/jira/browse/SPARK-26084). hvanhovell Closes #23075 from ssimeonov/ss_SPARK-26084. Authored-by: Simeon Simeonov Signed-off-by: Herman van Hovell (cherry picked from commit db136d360e54e13f1d7071a0428964a202cf7e31) Signed-off-by: Herman van Hovell Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3bb9fff6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3bb9fff6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3bb9fff6 Branch: refs/heads/branch-2.4 Commit: 3bb9fff687a1701b75552bae6a4f8bee3fa6460b Parents: c28a27a Author: Simeon Simeonov Authored: Tue Nov 20 21:29:56 2018 +0100 Committer: Herman van Hovell Committed: Tue Nov 20 21:31:11 2018 +0100 -- .../expressions/aggregate/interfaces.scala | 8 ++--- .../aggregate/AggregateExpressionSuite.scala| 34 2 files changed, 37 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3bb9fff6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala index e1d16a2..56c2ee6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala @@ -128,12 +128,10 @@ case class AggregateExpression( override def nullable: Boolean = aggregateFunction.nullable override def references: AttributeSet = { -val childReferences = mode match { - case Partial | Complete => aggregateFunction.references.toSeq - case PartialMerge | Final => aggregateFunction.aggBufferAttributes +mode match { + case Partial | Complete => aggregateFunction.references + case PartialMerge | Final => AttributeSet(aggregateFunction.aggBufferAttributes) } - -AttributeSet(childReferences) } override def toString: String = { http://git-wip-us.apache.org/repos/asf/spark/blob/3bb9fff6/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala new file mode 100644 index 000..8e9c997 --- /dev/null +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute +import org.apache.spark.sql.catalyst.expressions.{Add, AttributeSet} + +class AggregateExpressionSuite extends
spark git commit: [SPARK-26084][SQL] Fixes unresolved AggregateExpression.references exception
Repository: spark Updated Branches: refs/heads/master ab61ddb34 -> db136d360 [SPARK-26084][SQL] Fixes unresolved AggregateExpression.references exception ## What changes were proposed in this pull request? This PR fixes an exception in `AggregateExpression.references` called on unresolved expressions. It implements the solution proposed in [SPARK-26084](https://issues.apache.org/jira/browse/SPARK-26084), a minor refactoring that removes the unnecessary dependence on `AttributeSet.toSeq`, which requires expression IDs and, therefore, can only execute successfully for resolved expressions. The refactored implementation is both simpler and faster, eliminating the conversion of a `Set` to a `Seq` and back to `Set`. ## How was this patch tested? Added a new test based on the failing case in [SPARK-26084](https://issues.apache.org/jira/browse/SPARK-26084). hvanhovell Closes #23075 from ssimeonov/ss_SPARK-26084. Authored-by: Simeon Simeonov Signed-off-by: Herman van Hovell Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/db136d36 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/db136d36 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/db136d36 Branch: refs/heads/master Commit: db136d360e54e13f1d7071a0428964a202cf7e31 Parents: ab61ddb Author: Simeon Simeonov Authored: Tue Nov 20 21:29:56 2018 +0100 Committer: Herman van Hovell Committed: Tue Nov 20 21:29:56 2018 +0100 -- .../expressions/aggregate/interfaces.scala | 8 ++--- .../aggregate/AggregateExpressionSuite.scala| 34 2 files changed, 37 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/db136d36/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala index e1d16a2..56c2ee6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala @@ -128,12 +128,10 @@ case class AggregateExpression( override def nullable: Boolean = aggregateFunction.nullable override def references: AttributeSet = { -val childReferences = mode match { - case Partial | Complete => aggregateFunction.references.toSeq - case PartialMerge | Final => aggregateFunction.aggBufferAttributes +mode match { + case Partial | Complete => aggregateFunction.references + case PartialMerge | Final => AttributeSet(aggregateFunction.aggBufferAttributes) } - -AttributeSet(childReferences) } override def toString: String = { http://git-wip-us.apache.org/repos/asf/spark/blob/db136d36/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala new file mode 100644 index 000..8e9c997 --- /dev/null +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AggregateExpressionSuite.scala @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute +import org.apache.spark.sql.catalyst.expressions.{Add, AttributeSet} + +class AggregateExpressionSuite extends SparkFunSuite { + + test("test references from unresolved aggregate functions") { +val x =
svn commit: r31003 - in /dev/spark/2.4.1-SNAPSHOT-2018_11_20_10_44-c28a27a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue Nov 20 18:58:34 2018 New Revision: 31003 Log: Apache Spark 2.4.1-SNAPSHOT-2018_11_20_10_44-c28a27a docs [This commit notification would consist of 1476 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r31001 - in /dev/spark/3.0.0-SNAPSHOT-2018_11_20_08_39-ab61ddb-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue Nov 20 16:51:54 2018 New Revision: 31001 Log: Apache Spark 3.0.0-SNAPSHOT-2018_11_20_08_39-ab61ddb docs [This commit notification would consist of 1755 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-26118][WEB UI] Introducing spark.ui.requestHeaderSize for setting HTTP requestHeaderSize
Repository: spark Updated Branches: refs/heads/master c34c42234 -> ab61ddb34 [SPARK-26118][WEB UI] Introducing spark.ui.requestHeaderSize for setting HTTP requestHeaderSize ## What changes were proposed in this pull request? Introducing spark.ui.requestHeaderSize for configuring Jetty's HTTP requestHeaderSize. This way long authorization field does not lead to HTTP 413. ## How was this patch tested? Manually with curl (which version must be at least 7.55). With the original default value (8k limit): ```bash # Starting history server with default requestHeaderSize $ ./sbin/start-history-server.sh starting org.apache.spark.deploy.history.HistoryServer, logging to /Users/attilapiros/github/spark/logs/spark-attilapiros-org.apache.spark.deploy.history.HistoryServer-1-apiros-MBP.lan.out # Creating huge header $ echo -n "X-Custom-Header: " > cookie $ printf 'A%.0s' {1..9500} >> cookie # HTTP GET with huge header fails with 431 $ curl -H cookie http://458apiros-MBP.lan:18080/ Bad Message 431reason: Request Header Fields Too Large # The log contains the error $ tail -1 /Users/attilapiros/github/spark/logs/spark-attilapiros-org.apache.spark.deploy.history.HistoryServer-1-apiros-MBP.lan.out 18/11/19 21:24:28 WARN HttpParser: Header is too large 8193>8192 ``` After: ```bash # Creating the history properties file with the increased requestHeaderSize $ echo spark.ui.requestHeaderSize=1 > history.properties # Starting Spark History Server with the settings $ ./sbin/start-history-server.sh --properties-file history.properties starting org.apache.spark.deploy.history.HistoryServer, logging to /Users/attilapiros/github/spark/logs/spark-attilapiros-org.apache.spark.deploy.history.HistoryServer-1-apiros-MBP.lan.out # HTTP GET with huge header gives back HTML5 (I have added here only just a part of the response) $ curl -H cookie http://458apiros-MBP.lan:18080/ ... History Server ... ``` Closes #23090 from attilapiros/JettyHeaderSize. Authored-by: âattilapirosâ Signed-off-by: Imran Rashid Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ab61ddb3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ab61ddb3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ab61ddb3 Branch: refs/heads/master Commit: ab61ddb34d58ab5701191c8fd3a24a62f6ebf37b Parents: c34c422 Author: âattilapirosâ Authored: Tue Nov 20 08:56:22 2018 -0600 Committer: Imran Rashid Committed: Tue Nov 20 08:56:22 2018 -0600 -- .../scala/org/apache/spark/internal/config/package.scala | 6 ++ core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 6 -- docs/configuration.md| 8 3 files changed, 18 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ab61ddb3/core/src/main/scala/org/apache/spark/internal/config/package.scala -- diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index ab2b872..9cc48f6 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -570,6 +570,12 @@ package object config { .stringConf .createOptional + private[spark] val UI_REQUEST_HEADER_SIZE = +ConfigBuilder("spark.ui.requestHeaderSize") + .doc("Value for HTTP request header size in bytes.") + .bytesConf(ByteUnit.BYTE) + .createWithDefaultString("8k") + private[spark] val EXTRA_LISTENERS = ConfigBuilder("spark.extraListeners") .doc("Class names of listeners to add to SparkContext during initialization.") .stringConf http://git-wip-us.apache.org/repos/asf/spark/blob/ab61ddb3/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala -- diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala index 52a9551..316af9b 100644 --- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala +++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala @@ -356,13 +356,15 @@ private[spark] object JettyUtils extends Logging { (connector, connector.getLocalPort()) } + val httpConfig = new HttpConfiguration() + httpConfig.setRequestHeaderSize(conf.get(UI_REQUEST_HEADER_SIZE).toInt) // If SSL is configured, create the secure connector first. val securePort = sslOptions.createJettySslContextFactory().map { factory => val securePort = sslOptions.port.getOrElse(if (port > 0) Utils.userPort(port, 400) else 0)
spark git commit: [SPARK-26118][WEB UI] Introducing spark.ui.requestHeaderSize for setting HTTP requestHeaderSize
Repository: spark Updated Branches: refs/heads/branch-2.4 096e0d8f0 -> c28a27a25 [SPARK-26118][WEB UI] Introducing spark.ui.requestHeaderSize for setting HTTP requestHeaderSize ## What changes were proposed in this pull request? Introducing spark.ui.requestHeaderSize for configuring Jetty's HTTP requestHeaderSize. This way long authorization field does not lead to HTTP 413. ## How was this patch tested? Manually with curl (which version must be at least 7.55). With the original default value (8k limit): ```bash # Starting history server with default requestHeaderSize $ ./sbin/start-history-server.sh starting org.apache.spark.deploy.history.HistoryServer, logging to /Users/attilapiros/github/spark/logs/spark-attilapiros-org.apache.spark.deploy.history.HistoryServer-1-apiros-MBP.lan.out # Creating huge header $ echo -n "X-Custom-Header: " > cookie $ printf 'A%.0s' {1..9500} >> cookie # HTTP GET with huge header fails with 431 $ curl -H cookie http://458apiros-MBP.lan:18080/ Bad Message 431reason: Request Header Fields Too Large # The log contains the error $ tail -1 /Users/attilapiros/github/spark/logs/spark-attilapiros-org.apache.spark.deploy.history.HistoryServer-1-apiros-MBP.lan.out 18/11/19 21:24:28 WARN HttpParser: Header is too large 8193>8192 ``` After: ```bash # Creating the history properties file with the increased requestHeaderSize $ echo spark.ui.requestHeaderSize=1 > history.properties # Starting Spark History Server with the settings $ ./sbin/start-history-server.sh --properties-file history.properties starting org.apache.spark.deploy.history.HistoryServer, logging to /Users/attilapiros/github/spark/logs/spark-attilapiros-org.apache.spark.deploy.history.HistoryServer-1-apiros-MBP.lan.out # HTTP GET with huge header gives back HTML5 (I have added here only just a part of the response) $ curl -H cookie http://458apiros-MBP.lan:18080/ ... History Server ... ``` Closes #23090 from attilapiros/JettyHeaderSize. Authored-by: âattilapirosâ Signed-off-by: Imran Rashid (cherry picked from commit ab61ddb34d58ab5701191c8fd3a24a62f6ebf37b) Signed-off-by: Imran Rashid Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c28a27a2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c28a27a2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c28a27a2 Branch: refs/heads/branch-2.4 Commit: c28a27a2546ebbe0c001662126625638fcbb1100 Parents: 096e0d8 Author: âattilapirosâ Authored: Tue Nov 20 08:56:22 2018 -0600 Committer: Imran Rashid Committed: Tue Nov 20 08:56:39 2018 -0600 -- .../scala/org/apache/spark/internal/config/package.scala | 6 ++ core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 6 -- docs/configuration.md| 8 3 files changed, 18 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c28a27a2/core/src/main/scala/org/apache/spark/internal/config/package.scala -- diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index bde0995..3b3c45f 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -528,6 +528,12 @@ package object config { .stringConf .createOptional + private[spark] val UI_REQUEST_HEADER_SIZE = +ConfigBuilder("spark.ui.requestHeaderSize") + .doc("Value for HTTP request header size in bytes.") + .bytesConf(ByteUnit.BYTE) + .createWithDefaultString("8k") + private[spark] val EXTRA_LISTENERS = ConfigBuilder("spark.extraListeners") .doc("Class names of listeners to add to SparkContext during initialization.") .stringConf http://git-wip-us.apache.org/repos/asf/spark/blob/c28a27a2/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala -- diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala index 52a9551..316af9b 100644 --- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala +++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala @@ -356,13 +356,15 @@ private[spark] object JettyUtils extends Logging { (connector, connector.getLocalPort()) } + val httpConfig = new HttpConfiguration() + httpConfig.setRequestHeaderSize(conf.get(UI_REQUEST_HEADER_SIZE).toInt) // If SSL is configured, create the secure connector first. val securePort = sslOptions.createJettySslContextFactory().map { factory =>
spark git commit: [SPARK-26076][BUILD][MINOR] Revise ambiguous error message from load-spark-env.sh
Repository: spark Updated Branches: refs/heads/master a00aaf649 -> c34c42234 [SPARK-26076][BUILD][MINOR] Revise ambiguous error message from load-spark-env.sh ## What changes were proposed in this pull request? When I try to run scripts (e.g. `start-master.sh`/`start-history-server.sh ` in latest master, I got such error: ``` Presence of build for multiple Scala versions detected. Either clean one of them or, export SPARK_SCALA_VERSION in spark-env.sh. ``` The error message is quite confusing. Without reading `load-spark-env.sh`, I didn't know which directory to remove, or where to find and edit the `spark-evn.sh`. This PR is to make the error message more clear. Also change the script for less maintenance when we add or drop Scala versions in the future. As now with https://github.com/apache/spark/pull/22967, we can revise the error message as following(in my local setup): ``` Presence of build for multiple Scala versions detected (/Users/gengliangwang/IdeaProjects/spark/assembly/target/scala-2.12 and /Users/gengliangwang/IdeaProjects/spark/assembly/target/scala-2.11). Remove one of them or, export SPARK_SCALA_VERSION=2.12 in /Users/gengliangwang/IdeaProjects/spark/conf/spark-env.sh. Visit https://spark.apache.org/docs/latest/configuration.html#environment-variables for more details about setting environment variables in spark-env.sh. ``` ## How was this patch tested? Manual test Closes #23049 from gengliangwang/reviseEnvScript. Authored-by: Gengliang Wang Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c34c4223 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c34c4223 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c34c4223 Branch: refs/heads/master Commit: c34c42234f308872ebe9c7cdaee32000c0726eea Parents: a00aaf6 Author: Gengliang Wang Authored: Tue Nov 20 08:29:59 2018 -0600 Committer: Sean Owen Committed: Tue Nov 20 08:29:59 2018 -0600 -- bin/load-spark-env.sh | 27 --- 1 file changed, 16 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c34c4223/bin/load-spark-env.sh -- diff --git a/bin/load-spark-env.sh b/bin/load-spark-env.sh index 0b5006d..0ada5d8 100644 --- a/bin/load-spark-env.sh +++ b/bin/load-spark-env.sh @@ -26,15 +26,17 @@ if [ -z "${SPARK_HOME}" ]; then source "$(dirname "$0")"/find-spark-home fi +SPARK_ENV_SH="spark-env.sh" if [ -z "$SPARK_ENV_LOADED" ]; then export SPARK_ENV_LOADED=1 export SPARK_CONF_DIR="${SPARK_CONF_DIR:-"${SPARK_HOME}"/conf}" - if [ -f "${SPARK_CONF_DIR}/spark-env.sh" ]; then + SPARK_ENV_SH="${SPARK_CONF_DIR}/${SPARK_ENV_SH}" + if [[ -f "${SPARK_ENV_SH}" ]]; then # Promote all variable declarations to environment (exported) variables set -a -. "${SPARK_CONF_DIR}/spark-env.sh" +. ${SPARK_ENV_SH} set +a fi fi @@ -42,19 +44,22 @@ fi # Setting SPARK_SCALA_VERSION if not already set. if [ -z "$SPARK_SCALA_VERSION" ]; then + SCALA_VERSION_1=2.12 + SCALA_VERSION_2=2.11 - ASSEMBLY_DIR2="${SPARK_HOME}/assembly/target/scala-2.11" - ASSEMBLY_DIR1="${SPARK_HOME}/assembly/target/scala-2.12" - - if [[ -d "$ASSEMBLY_DIR2" && -d "$ASSEMBLY_DIR1" ]]; then -echo -e "Presence of build for multiple Scala versions detected." 1>&2 -echo -e 'Either clean one of them or, export SPARK_SCALA_VERSION in spark-env.sh.' 1>&2 + ASSEMBLY_DIR_1="${SPARK_HOME}/assembly/target/scala-${SCALA_VERSION_1}" + ASSEMBLY_DIR_2="${SPARK_HOME}/assembly/target/scala-${SCALA_VERSION_2}" + ENV_VARIABLE_DOC="https://spark.apache.org/docs/latest/configuration.html#environment-variables; + if [[ -d "$ASSEMBLY_DIR_1" && -d "$ASSEMBLY_DIR_2" ]]; then +echo "Presence of build for multiple Scala versions detected ($ASSEMBLY_DIR_1 and $ASSEMBLY_DIR_2)." 1>&2 +echo "Remove one of them or, export SPARK_SCALA_VERSION=$SCALA_VERSION_1 in ${SPARK_ENV_SH}." 1>&2 +echo "Visit ${ENV_VARIABLE_DOC} for more details about setting environment variables in spark-env.sh." 1>&2 exit 1 fi - if [ -d "$ASSEMBLY_DIR2" ]; then -export SPARK_SCALA_VERSION="2.11" + if [[ -d "$ASSEMBLY_DIR_1" ]]; then +export SPARK_SCALA_VERSION=${SCALA_VERSION_1} else -export SPARK_SCALA_VERSION="2.12" +export SPARK_SCALA_VERSION=${SCALA_VERSION_2} fi fi - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][YARN] Make memLimitExceededLogMessage more clean
Repository: spark Updated Branches: refs/heads/master a09d5ba88 -> a00aaf649 [MINOR][YARN] Make memLimitExceededLogMessage more clean ## What changes were proposed in this pull request? Current `memLimitExceededLogMessage`: https://user-images.githubusercontent.com/5399861/48467789-ec8e1000-e824-11e8-91fc-280d342e1bf3.png; width="360"> Itâs not very clear, because physical memory exceeds but suggestion contains virtual memory config. This pr makes it more clear and replace deprecated config: ```spark.yarn.executor.memoryOverhead```. ## How was this patch tested? manual tests Closes #23030 from wangyum/EXECUTOR_MEMORY_OVERHEAD. Authored-by: Yuming Wang Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a00aaf64 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a00aaf64 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a00aaf64 Branch: refs/heads/master Commit: a00aaf649cb5a14648102b2980ce21393804f2c7 Parents: a09d5ba Author: Yuming Wang Authored: Tue Nov 20 08:27:57 2018 -0600 Committer: Sean Owen Committed: Tue Nov 20 08:27:57 2018 -0600 -- .../spark/deploy/yarn/YarnAllocator.scala | 33 +--- .../spark/deploy/yarn/YarnAllocatorSuite.scala | 12 --- 2 files changed, 14 insertions(+), 31 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a00aaf64/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala -- diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala index ebdcf45..9497530 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala @@ -20,7 +20,6 @@ package org.apache.spark.deploy.yarn import java.util.Collections import java.util.concurrent._ import java.util.concurrent.atomic.AtomicInteger -import java.util.regex.Pattern import scala.collection.JavaConverters._ import scala.collection.mutable @@ -598,13 +597,21 @@ private[yarn] class YarnAllocator( (false, s"Container ${containerId}${onHostStr} was preempted.") // Should probably still count memory exceeded exit codes towards task failures case VMEM_EXCEEDED_EXIT_CODE => -(true, memLimitExceededLogMessage( - completedContainer.getDiagnostics, - VMEM_EXCEEDED_PATTERN)) +val vmemExceededPattern = raw"$MEM_REGEX of $MEM_REGEX virtual memory used".r +val diag = vmemExceededPattern.findFirstIn(completedContainer.getDiagnostics) + .map(_.concat(".")).getOrElse("") +val message = "Container killed by YARN for exceeding virtual memory limits. " + + s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key} or boosting " + + s"${YarnConfiguration.NM_VMEM_PMEM_RATIO} or disabling " + + s"${YarnConfiguration.NM_VMEM_CHECK_ENABLED} because of YARN-4714." +(true, message) case PMEM_EXCEEDED_EXIT_CODE => -(true, memLimitExceededLogMessage( - completedContainer.getDiagnostics, - PMEM_EXCEEDED_PATTERN)) +val pmemExceededPattern = raw"$MEM_REGEX of $MEM_REGEX physical memory used".r +val diag = pmemExceededPattern.findFirstIn(completedContainer.getDiagnostics) + .map(_.concat(".")).getOrElse("") +val message = "Container killed by YARN for exceeding physical memory limits. " + + s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key}." +(true, message) case _ => // all the failures which not covered above, like: // disk failure, kill by app master or resource manager, ... @@ -735,18 +742,6 @@ private[yarn] class YarnAllocator( private object YarnAllocator { val MEM_REGEX = "[0-9.]+ [KMG]B" - val PMEM_EXCEEDED_PATTERN = -Pattern.compile(s"$MEM_REGEX of $MEM_REGEX physical memory used") - val VMEM_EXCEEDED_PATTERN = -Pattern.compile(s"$MEM_REGEX of $MEM_REGEX virtual memory used") val VMEM_EXCEEDED_EXIT_CODE = -103 val PMEM_EXCEEDED_EXIT_CODE = -104 - - def memLimitExceededLogMessage(diagnostics: String, pattern: Pattern): String = { -val matcher = pattern.matcher(diagnostics) -val diag = if (matcher.find()) " " + matcher.group() + "." else "" -s"Container killed by YARN for exceeding memory limits. $diag " + - "Consider boosting spark.yarn.executor.memoryOverhead or " + - "disabling