spark git commit: [SPARK-22779][SQL] Resolve default values for fallback configs.
Repository: spark Updated Branches: refs/heads/master f8c7c1f21 -> c3dd2a26d [SPARK-22779][SQL] Resolve default values for fallback configs. SQLConf allows some callers to define a custom default value for configs, and that complicates a little bit the handling of fallback config entries, since most of the default value resolution is hidden by the config code. This change peaks into the internals of these fallback configs to figure out the correct default value, and also returns the current human-readable default when showing the default value (e.g. through "set -v"). Author: Marcelo VanzinCloses #19974 from vanzin/SPARK-22779. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c3dd2a26 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c3dd2a26 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c3dd2a26 Branch: refs/heads/master Commit: c3dd2a26deaadf508b4e163eab2c0544cd922540 Parents: f8c7c1f Author: Marcelo Vanzin Authored: Wed Dec 13 22:46:20 2017 -0800 Committer: gatorsmile Committed: Wed Dec 13 22:46:20 2017 -0800 -- .../spark/internal/config/ConfigEntry.scala | 8 -- .../org/apache/spark/sql/internal/SQLConf.scala | 16 --- .../spark/sql/internal/SQLConfSuite.scala | 30 3 files changed, 47 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c3dd2a26/core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala -- diff --git a/core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala b/core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala index f119028..ede3ace 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala @@ -139,7 +139,7 @@ private[spark] class OptionalConfigEntry[T]( s => Some(rawValueConverter(s)), v => v.map(rawStringConverter).orNull, doc, isPublic) { - override def defaultValueString: String = "" + override def defaultValueString: String = ConfigEntry.UNDEFINED override def readFrom(reader: ConfigReader): Option[T] = { readString(reader).map(rawValueConverter) @@ -149,12 +149,12 @@ private[spark] class OptionalConfigEntry[T]( /** * A config entry whose default value is defined by another config entry. */ -private class FallbackConfigEntry[T] ( +private[spark] class FallbackConfigEntry[T] ( key: String, alternatives: List[String], doc: String, isPublic: Boolean, -private[config] val fallback: ConfigEntry[T]) +val fallback: ConfigEntry[T]) extends ConfigEntry[T](key, alternatives, fallback.valueConverter, fallback.stringConverter, doc, isPublic) { @@ -167,6 +167,8 @@ private class FallbackConfigEntry[T] ( private[spark] object ConfigEntry { + val UNDEFINED = "" + private val knownConfigs = new java.util.concurrent.ConcurrentHashMap[String, ConfigEntry[_]]() def registerEntry(entry: ConfigEntry[_]): Unit = { http://git-wip-us.apache.org/repos/asf/spark/blob/c3dd2a26/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 1121444..cf7e3eb 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -1379,7 +1379,7 @@ class SQLConf extends Serializable with Logging { Option(settings.get(key)). orElse { // Try to use the default value -Option(sqlConfEntries.get(key)).map(_.defaultValueString) +Option(sqlConfEntries.get(key)).map { e => e.stringConverter(e.readFrom(reader)) } }. getOrElse(throw new NoSuchElementException(key)) } @@ -1417,14 +1417,21 @@ class SQLConf extends Serializable with Logging { * not set yet, return `defaultValue`. */ def getConfString(key: String, defaultValue: String): String = { -if (defaultValue != null && defaultValue != "") { +if (defaultValue != null && defaultValue != ConfigEntry.UNDEFINED) { val entry = sqlConfEntries.get(key) if (entry != null) { // Only verify configs in the SQLConf object entry.valueConverter(defaultValue) } } -Option(settings.get(key)).getOrElse(defaultValue) +Option(settings.get(key)).getOrElse { + // If the key is not set, need to check whether the config entry is registered and is + // a
[2/2] spark git commit: [SPARK-22732] Add Structured Streaming APIs to DataSourceV2
[SPARK-22732] Add Structured Streaming APIs to DataSourceV2 ## What changes were proposed in this pull request? This PR provides DataSourceV2 API support for structured streaming, including new pieces needed to support continuous processing [SPARK-20928]. High level summary: - DataSourceV2 includes new mixins to support micro-batch and continuous reads and writes. For reads, we accept an optional user specified schema rather than using the ReadSupportWithSchema model, because doing so would severely complicate the interface. - DataSourceV2Reader includes new interfaces to read a specific microbatch or read continuously from a given offset. These follow the same setter pattern as the existing Supports* mixins so that they can work with SupportsScanUnsafeRow. - DataReader (the per-partition reader) has a new subinterface ContinuousDataReader only for continuous processing. This reader has a special method to check progress, and next() blocks for new input rather than returning false. - Offset, an abstract representation of position in a streaming query, is ported to the public API. (Each type of reader will define its own Offset implementation.) - DataSourceV2Writer has a new subinterface ContinuousWriter only for continuous processing. Commits to this interface come tagged with an epoch number, as the execution engine will continue to produce new epoch commits as the task continues indefinitely. Note that this PR does not propose to change the existing DataSourceV2 batch API, or deprecate the existing streaming source/sink internal APIs in spark.sql.execution.streaming. ## How was this patch tested? Toy implementations of the new interfaces with unit tests. Author: Jose TorresCloses #19925 from joseph-torres/continuous-api. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f8c7c1f2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f8c7c1f2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f8c7c1f2 Branch: refs/heads/master Commit: f8c7c1f21aa9d1fd38b584ca8c4adf397966e9f7 Parents: 1e44dd0 Author: Jose Torres Authored: Wed Dec 13 22:31:39 2017 -0800 Committer: Shixiong Zhu Committed: Wed Dec 13 22:31:39 2017 -0800 -- .../apache/spark/sql/kafka010/KafkaSource.scala | 1 + .../spark/sql/kafka010/KafkaSourceOffset.scala | 3 +- .../spark/sql/kafka010/KafkaSourceSuite.scala | 1 + .../sql/sources/v2/ContinuousReadSupport.java | 42 + .../sql/sources/v2/ContinuousWriteSupport.java | 54 ++ .../sql/sources/v2/DataSourceV2Options.java | 8 + .../sql/sources/v2/MicroBatchReadSupport.java | 52 ++ .../sql/sources/v2/MicroBatchWriteSupport.java | 58 ++ .../sources/v2/reader/ContinuousDataReader.java | 36 .../sql/sources/v2/reader/ContinuousReader.java | 68 +++ .../sql/sources/v2/reader/MicroBatchReader.java | 64 +++ .../spark/sql/sources/v2/reader/Offset.java | 60 +++ .../sql/sources/v2/reader/PartitionOffset.java | 30 .../sql/sources/v2/writer/ContinuousWriter.java | 41 + .../execution/streaming/BaseStreamingSink.java | 27 +++ .../streaming/BaseStreamingSource.java | 37 .../execution/streaming/FileStreamSource.scala | 1 + .../streaming/FileStreamSourceOffset.scala | 3 + .../sql/execution/streaming/LongOffset.scala| 2 + .../spark/sql/execution/streaming/Offset.scala | 34 +--- .../sql/execution/streaming/OffsetSeq.scala | 1 + .../sql/execution/streaming/OffsetSeqLog.scala | 1 + .../streaming/RateSourceProvider.scala | 22 ++- .../execution/streaming/RateStreamOffset.scala | 29 +++ .../streaming/RateStreamSourceV2.scala | 162 + .../spark/sql/execution/streaming/Source.scala | 1 + .../execution/streaming/StreamExecution.scala | 1 + .../execution/streaming/StreamProgress.scala| 2 + .../continuous/ContinuousRateStreamSource.scala | 152 .../spark/sql/execution/streaming/memory.scala | 1 + .../sql/execution/streaming/memoryV2.scala | 178 +++ .../spark/sql/execution/streaming/socket.scala | 1 + .../execution/streaming/MemorySinkV2Suite.scala | 82 + .../execution/streaming/OffsetSeqLogSuite.scala | 1 + .../execution/streaming/RateSourceSuite.scala | 1 + .../execution/streaming/RateSourceV2Suite.scala | 155 .../sql/streaming/FileStreamSourceSuite.scala | 1 + .../spark/sql/streaming/OffsetSuite.scala | 3 +- .../spark/sql/streaming/StreamSuite.scala | 1 + .../apache/spark/sql/streaming/StreamTest.scala | 1 + .../streaming/StreamingAggregationSuite.scala | 1 + .../streaming/StreamingQueryListenerSuite.scala | 1 + .../sql/streaming/StreamingQuerySuite.scala | 1 +
[1/2] spark git commit: [SPARK-22732] Add Structured Streaming APIs to DataSourceV2
Repository: spark Updated Branches: refs/heads/master 1e44dd004 -> f8c7c1f21 http://git-wip-us.apache.org/repos/asf/spark/blob/f8c7c1f2/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala index b6baaed..7a2d9e3 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala @@ -32,6 +32,7 @@ import org.apache.spark.sql.catalyst.util._ import org.apache.spark.sql.execution.streaming._ import org.apache.spark.sql.execution.streaming.FileStreamSource.{FileEntry, SeenFilesMap} import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.v2.reader.Offset import org.apache.spark.sql.streaming.ExistsThrowsExceptionFileSystem._ import org.apache.spark.sql.streaming.util.StreamManualClock import org.apache.spark.sql.test.SharedSQLContext http://git-wip-us.apache.org/repos/asf/spark/blob/f8c7c1f2/sql/core/src/test/scala/org/apache/spark/sql/streaming/OffsetSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/OffsetSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/OffsetSuite.scala index f208f9b..4297482 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/OffsetSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/OffsetSuite.scala @@ -18,7 +18,8 @@ package org.apache.spark.sql.streaming import org.apache.spark.SparkFunSuite -import org.apache.spark.sql.execution.streaming.{LongOffset, Offset, SerializedOffset} +import org.apache.spark.sql.execution.streaming.{LongOffset, SerializedOffset} +import org.apache.spark.sql.sources.v2.reader.Offset trait OffsetSuite extends SparkFunSuite { /** Creates test to check all the comparisons of offsets given a `one` that is less than `two`. */ http://git-wip-us.apache.org/repos/asf/spark/blob/f8c7c1f2/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala index 3d687d2..8163a1f 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala @@ -39,6 +39,7 @@ import org.apache.spark.sql.execution.streaming.state.{StateStore, StateStoreCon import org.apache.spark.sql.functions._ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.sources.StreamSourceProvider +import org.apache.spark.sql.sources.v2.reader.Offset import org.apache.spark.sql.streaming.util.StreamManualClock import org.apache.spark.sql.types.{IntegerType, StructField, StructType} import org.apache.spark.util.Utils http://git-wip-us.apache.org/repos/asf/spark/blob/f8c7c1f2/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala index dc5b998..7a1ff89 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala @@ -39,6 +39,7 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.catalyst.util._ import org.apache.spark.sql.execution.streaming._ import org.apache.spark.sql.execution.streaming.state.StateStore +import org.apache.spark.sql.sources.v2.reader.Offset import org.apache.spark.sql.streaming.StreamingQueryListener._ import org.apache.spark.sql.test.SharedSQLContext import org.apache.spark.util.{Clock, SystemClock, Utils} http://git-wip-us.apache.org/repos/asf/spark/blob/f8c7c1f2/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala index 1b4d855..fa03135 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala @@ -34,6 +34,7 @@ import org.apache.spark.sql.execution.streaming._ import
spark git commit: [SPARK-3181][ML] Implement huber loss for LinearRegression.
Repository: spark Updated Branches: refs/heads/master 2a29a60da -> 1e44dd004 [SPARK-3181][ML] Implement huber loss for LinearRegression. ## What changes were proposed in this pull request? MLlib ```LinearRegression``` supports _huber_ loss addition to _leastSquares_ loss. The huber loss objective function is: ![image](https://user-images.githubusercontent.com/1962026/29554124-9544d198-8750-11e7-8afa-33579ec419d5.png) Refer Eq.(6) and Eq.(8) in [A robust hybrid of lasso and ridge regression](http://statweb.stanford.edu/~owen/reports/hhu.pdf). This objective is jointly convex as a function of (w, Ï) â R Ã (0,â), we can use L-BFGS-B to solve it. The current implementation is a straight forward porting for Python scikit-learn [```HuberRegressor```](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html). There are some differences: * We use mean loss (```lossSum/weightSum```), but sklearn uses total loss (```lossSum```). * We multiply the loss function and L2 regularization by 1/2. It does not affect the result if we multiply the whole formula by a factor, we just keep consistent with _leastSquares_ loss. So if fitting w/o regularization, MLlib and sklearn produce the same output. If fitting w/ regularization, MLlib should set ```regParam``` divide by the number of instances to match the output of sklearn. ## How was this patch tested? Unit tests. Author: Yanbo LiangCloses #19020 from yanboliang/spark-3181. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1e44dd00 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1e44dd00 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1e44dd00 Branch: refs/heads/master Commit: 1e44dd004425040912f2cf16362d2c13f12e1689 Parents: 2a29a60 Author: Yanbo Liang Authored: Wed Dec 13 21:19:14 2017 -0800 Committer: Yanbo Liang Committed: Wed Dec 13 21:19:14 2017 -0800 -- .../ml/optim/aggregator/HuberAggregator.scala | 150 ++ .../ml/param/shared/SharedParamsCodeGen.scala | 3 +- .../spark/ml/param/shared/sharedParams.scala| 17 ++ .../spark/ml/regression/LinearRegression.scala | 299 +++ .../optim/aggregator/HuberAggregatorSuite.scala | 170 +++ .../ml/regression/LinearRegressionSuite.scala | 244 ++- project/MimaExcludes.scala | 5 + 7 files changed, 823 insertions(+), 65 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1e44dd00/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala b/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala new file mode 100644 index 000..13f64d2 --- /dev/null +++ b/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.ml.optim.aggregator + +import org.apache.spark.broadcast.Broadcast +import org.apache.spark.ml.feature.Instance +import org.apache.spark.ml.linalg.Vector + +/** + * HuberAggregator computes the gradient and loss for a huber loss function, + * as used in robust regression for samples in sparse or dense vector in an online fashion. + * + * The huber loss function based on: + * http://statweb.stanford.edu/~owen/reports/hhu.pdf;>Art B. Owen (2006), + * A robust hybrid of lasso and ridge regression. + * + * Two HuberAggregator can be merged together to have a summary of loss and gradient of + * the corresponding joint dataset. + * + * The huber loss function is given by + * + * + * $$ + * \begin{align} + * \min_{w, \sigma}\frac{1}{2n}{\sum_{i=1}^n\left(\sigma + + * H_m\left(\frac{X_{i}w - y_{i}}{\sigma}\right)\sigma\right) + \frac{1}{2}\lambda {||w||_2}^2} + * \end{align} + * $$ + * + * + *
svn commit: r23723 - in /dev/spark/2.3.0-SNAPSHOT-2017_12_13_20_01-2a29a60-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Thu Dec 14 04:14:32 2017 New Revision: 23723 Log: Apache Spark 2.3.0-SNAPSHOT-2017_12_13_20_01-2a29a60 docs [This commit notification would consist of 1407 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[1/2] spark git commit: Revert "[SPARK-22600][SQL][FOLLOW-UP] Fix a compilation error in TPCDS q75/q77"
Repository: spark Updated Branches: refs/heads/master ef9299965 -> 2a29a60da Revert "[SPARK-22600][SQL][FOLLOW-UP] Fix a compilation error in TPCDS q75/q77" This reverts commit ef92999653f0e2a47752379a867647445d849aab. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bc7e4a90 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bc7e4a90 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bc7e4a90 Branch: refs/heads/master Commit: bc7e4a90c0c91970a94aa385971daac48db6264e Parents: ef92999 Author: Wenchen FanAuthored: Thu Dec 14 11:21:34 2017 +0800 Committer: Wenchen Fan Committed: Thu Dec 14 11:21:34 2017 +0800 -- .../spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala | 1 + .../apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/bc7e4a90/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala index 807cb94..a2dda48 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala @@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.expressions.codegen import scala.collection.mutable import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types.DataType /** * Defines util methods used in expression code generation. http://git-wip-us.apache.org/repos/asf/spark/blob/bc7e4a90/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala index 634014a..c96ed6e 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala @@ -192,7 +192,7 @@ case class BroadcastHashJoinExec( | $value = ${ev.value}; |} """.stripMargin -ExprCode(code, isNull, value, inputRow = matched) +ExprCode(code, isNull, value) } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[2/2] spark git commit: Revert "[SPARK-22600][SQL] Fix 64kb limit for deeply nested expressions under wholestage codegen"
Revert "[SPARK-22600][SQL] Fix 64kb limit for deeply nested expressions under wholestage codegen" This reverts commit c7d0148615c921dca782ee3785b5d0cd59e42262. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2a29a60d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2a29a60d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2a29a60d Branch: refs/heads/master Commit: 2a29a60da32a5ccd04cc7c5c22c4075b159389e3 Parents: bc7e4a9 Author: Wenchen FanAuthored: Thu Dec 14 11:22:23 2017 +0800 Committer: Wenchen Fan Committed: Thu Dec 14 11:22:23 2017 +0800 -- .../sql/catalyst/expressions/Expression.scala | 37 +-- .../expressions/codegen/CodeGenerator.scala | 44 +-- .../expressions/codegen/ExpressionCodegen.scala | 269 --- .../codegen/ExpressionCodegenSuite.scala| 220 --- .../spark/sql/execution/ColumnarBatchScan.scala | 5 +- .../sql/execution/WholeStageCodegenSuite.scala | 23 +- 6 files changed, 13 insertions(+), 585 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2a29a60d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala index 329ea5d..743782a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala @@ -105,12 +105,6 @@ abstract class Expression extends TreeNode[Expression] { val isNull = ctx.freshName("isNull") val value = ctx.freshName("value") val eval = doGenCode(ctx, ExprCode("", isNull, value)) - eval.isNull = if (this.nullable) eval.isNull else "false" - - // Records current input row and variables of this expression. - eval.inputRow = ctx.INPUT_ROW - eval.inputVars = findInputVars(ctx, eval) - reduceCodeSize(ctx, eval) if (eval.code.nonEmpty) { // Add `this` in the comment. @@ -121,29 +115,9 @@ abstract class Expression extends TreeNode[Expression] { } } - /** - * Returns the input variables to this expression. - */ - private def findInputVars(ctx: CodegenContext, eval: ExprCode): Seq[ExprInputVar] = { -if (ctx.currentVars != null) { - this.collect { -case b @ BoundReference(ordinal, _, _) if ctx.currentVars(ordinal) != null => - ExprInputVar(exprCode = ctx.currentVars(ordinal), -dataType = b.dataType, nullable = b.nullable) - } -} else { - Seq.empty -} - } - - /** - * In order to prevent 64kb compile error, reducing the size of generated codes by - * separating it into a function if the size exceeds a threshold. - */ private def reduceCodeSize(ctx: CodegenContext, eval: ExprCode): Unit = { -lazy val funcParams = ExpressionCodegen.getExpressionInputParams(ctx, this) - -if (eval.code.trim.length > 1024 && funcParams.isDefined) { +// TODO: support whole stage codegen too +if (eval.code.trim.length > 1024 && ctx.INPUT_ROW != null && ctx.currentVars == null) { val setIsNull = if (eval.isNull != "false" && eval.isNull != "true") { val globalIsNull = ctx.freshName("globalIsNull") ctx.addMutableState(ctx.JAVA_BOOLEAN, globalIsNull) @@ -158,12 +132,9 @@ abstract class Expression extends TreeNode[Expression] { val newValue = ctx.freshName("value") val funcName = ctx.freshName(nodeName) - val callParams = funcParams.map(_._1.mkString(", ")).get - val declParams = funcParams.map(_._2.mkString(", ")).get - val funcFullName = ctx.addNewFunction(funcName, s""" - |private $javaType $funcName($declParams) { + |private $javaType $funcName(InternalRow ${ctx.INPUT_ROW}) { | ${eval.code.trim} | $setIsNull | return ${eval.value}; @@ -171,7 +142,7 @@ abstract class Expression extends TreeNode[Expression] { """.stripMargin) eval.value = newValue - eval.code = s"$javaType $newValue = $funcFullName($callParams);" + eval.code = s"$javaType $newValue = $funcFullName(${ctx.INPUT_ROW});" } } http://git-wip-us.apache.org/repos/asf/spark/blob/2a29a60d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
svn commit: r23721 - in /dev/spark/2.3.0-SNAPSHOT-2017_12_13_16_01-ef92999-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Thu Dec 14 00:14:38 2017 New Revision: 23721 Log: Apache Spark 2.3.0-SNAPSHOT-2017_12_13_16_01-ef92999 docs [This commit notification would consist of 1407 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-22600][SQL][FOLLOW-UP] Fix a compilation error in TPCDS q75/q77
Repository: spark Updated Branches: refs/heads/master a83e8e6c2 -> ef9299965 [SPARK-22600][SQL][FOLLOW-UP] Fix a compilation error in TPCDS q75/q77 ## What changes were proposed in this pull request? This pr fixed a compilation error of TPCDS `q75`/`q77` caused by #19813; ``` java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 371, Column 16: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 371, Column 16: Expression "bhj_matched" is not an rvalue at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) ``` ## How was this patch tested? Manually checked `q75`/`q77` can be properly compiled Author: Takeshi YamamuroCloses #19969 from maropu/SPARK-22600-FOLLOWUP. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ef929996 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ef929996 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ef929996 Branch: refs/heads/master Commit: ef92999653f0e2a47752379a867647445d849aab Parents: a83e8e6 Author: Takeshi Yamamuro Authored: Wed Dec 13 15:55:16 2017 -0800 Committer: gatorsmile Committed: Wed Dec 13 15:55:16 2017 -0800 -- .../spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala | 1 - .../apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ef929996/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala index a2dda48..807cb94 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala @@ -20,7 +20,6 @@ package org.apache.spark.sql.catalyst.expressions.codegen import scala.collection.mutable import org.apache.spark.sql.catalyst.expressions._ -import org.apache.spark.sql.types.DataType /** * Defines util methods used in expression code generation. http://git-wip-us.apache.org/repos/asf/spark/blob/ef929996/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala index c96ed6e..634014a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala @@ -192,7 +192,7 @@ case class BroadcastHashJoinExec( | $value = ${ev.value}; |} """.stripMargin -ExprCode(code, isNull, value) +ExprCode(code, isNull, value, inputRow = matched) } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-22764][CORE] Fix flakiness in SparkContextSuite.
Repository: spark Updated Branches: refs/heads/master ba0e79f57 -> a83e8e6c2 [SPARK-22764][CORE] Fix flakiness in SparkContextSuite. Use a semaphore to synchronize the tasks with the listener code that is trying to cancel the job or stage, so that the listener won't try to cancel a job or stage that has already finished. Author: Marcelo VanzinCloses #19956 from vanzin/SPARK-22764. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a83e8e6c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a83e8e6c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a83e8e6c Branch: refs/heads/master Commit: a83e8e6c223df8b819335cbabbfff9956942f2ad Parents: ba0e79f Author: Marcelo Vanzin Authored: Wed Dec 13 16:06:16 2017 -0600 Committer: Imran Rashid Committed: Wed Dec 13 16:06:16 2017 -0600 -- .../scala/org/apache/spark/SparkContextSuite.scala | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a83e8e6c/core/src/test/scala/org/apache/spark/SparkContextSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala index 37fcc93..b30bd74 100644 --- a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala @@ -20,7 +20,7 @@ package org.apache.spark import java.io.File import java.net.{MalformedURLException, URI} import java.nio.charset.StandardCharsets -import java.util.concurrent.TimeUnit +import java.util.concurrent.{Semaphore, TimeUnit} import scala.concurrent.duration._ @@ -499,6 +499,7 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu test("Cancelling stages/jobs with custom reasons.") { sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")) val REASON = "You shall not pass" +val slices = 10 val listener = new SparkListener { override def onTaskStart(taskStart: SparkListenerTaskStart): Unit = { @@ -508,6 +509,7 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu } sc.cancelStage(taskStart.stageId, REASON) SparkContextSuite.cancelStage = false + SparkContextSuite.semaphore.release(slices) } } @@ -518,21 +520,25 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu } sc.cancelJob(jobStart.jobId, REASON) SparkContextSuite.cancelJob = false + SparkContextSuite.semaphore.release(slices) } } } sc.addSparkListener(listener) for (cancelWhat <- Seq("stage", "job")) { + SparkContextSuite.semaphore.drainPermits() SparkContextSuite.isTaskStarted = false SparkContextSuite.cancelStage = (cancelWhat == "stage") SparkContextSuite.cancelJob = (cancelWhat == "job") val ex = intercept[SparkException] { -sc.range(0, 1L).mapPartitions { x => - org.apache.spark.SparkContextSuite.isTaskStarted = true +sc.range(0, 1L, numSlices = slices).mapPartitions { x => + SparkContextSuite.isTaskStarted = true + // Block waiting for the listener to cancel the stage or job. + SparkContextSuite.semaphore.acquire() x -}.cartesian(sc.range(0, 10L))count() +}.count() } ex.getCause() match { @@ -636,4 +642,5 @@ object SparkContextSuite { @volatile var isTaskStarted = false @volatile var taskKilled = false @volatile var taskSucceeded = false + val semaphore = new Semaphore(0) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-22772][SQL] Use splitExpressionsWithCurrentInputs to split codes in elt
Repository: spark Updated Branches: refs/heads/master 0bdb4e516 -> ba0e79f57 [SPARK-22772][SQL] Use splitExpressionsWithCurrentInputs to split codes in elt ## What changes were proposed in this pull request? In SPARK-22550 which fixes 64KB JVM bytecode limit problem with elt, `buildCodeBlocks` is used to split codes. However, we should use `splitExpressionsWithCurrentInputs` because it considers both normal and wholestage codgen (it is not supported yet, so it simply doesn't split the codes). ## How was this patch tested? Existing tests. Author: Liang-Chi HsiehCloses #19964 from viirya/SPARK-22772. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ba0e79f5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ba0e79f5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ba0e79f5 Branch: refs/heads/master Commit: ba0e79f57caa279773fb014b7883ee5d69dd0a68 Parents: 0bdb4e5 Author: Liang-Chi Hsieh Authored: Wed Dec 13 13:54:16 2017 -0800 Committer: gatorsmile Committed: Wed Dec 13 13:54:16 2017 -0800 -- .../expressions/codegen/CodeGenerator.scala | 2 +- .../expressions/stringExpressions.scala | 81 ++-- 2 files changed, 43 insertions(+), 40 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ba0e79f5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala index 257c3f1..b1d9311 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala @@ -878,7 +878,7 @@ class CodegenContext { * * @param expressions the codes to evaluate expressions. */ - def buildCodeBlocks(expressions: Seq[String]): Seq[String] = { + private def buildCodeBlocks(expressions: Seq[String]): Seq[String] = { val blocks = new ArrayBuffer[String]() val blockBuilder = new StringBuilder() var length = 0 http://git-wip-us.apache.org/repos/asf/spark/blob/ba0e79f5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala index 47f0b57..8c4d2fd 100755 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala @@ -289,53 +289,56 @@ case class Elt(children: Seq[Expression]) val index = indexExpr.genCode(ctx) val strings = stringExprs.map(_.genCode(ctx)) val indexVal = ctx.freshName("index") +val indexMatched = ctx.freshName("eltIndexMatched") + val stringVal = ctx.freshName("stringVal") +ctx.addMutableState(ctx.javaType(dataType), stringVal) + val assignStringValue = strings.zipWithIndex.map { case (eval, index) => s""" -case ${index + 1}: - ${eval.code} - $stringVal = ${eval.isNull} ? null : ${eval.value}; - break; - """ + |if ($indexVal == ${index + 1}) { + | ${eval.code} + | $stringVal = ${eval.isNull} ? null : ${eval.value}; + | $indexMatched = true; + | continue; + |} + """.stripMargin } -val cases = ctx.buildCodeBlocks(assignStringValue) -val codes = if (cases.length == 1) { - s""" -UTF8String $stringVal = null; -switch ($indexVal) { - ${cases.head} -} - """ -} else { - var prevFunc = "null" - for (c <- cases.reverse) { -val funcName = ctx.freshName("eltFunc") -val funcBody = s""" - private UTF8String $funcName(InternalRow ${ctx.INPUT_ROW}, int $indexVal) { - UTF8String $stringVal = null; - switch ($indexVal) { - $c - default: - return $prevFunc; - } - return $stringVal; - } -""" -val fullFuncName = ctx.addNewFunction(funcName, funcBody) -prevFunc = s"$fullFuncName(${ctx.INPUT_ROW}, $indexVal)" - } - s"UTF8String $stringVal = $prevFunc;" -} +val codes = ctx.splitExpressionsWithCurrentInputs( + expressions
spark git commit: [SPARK-22574][MESOS][SUBMIT] Check submission request parameters
Repository: spark Updated Branches: refs/heads/master 1abcbed67 -> 0bdb4e516 [SPARK-22574][MESOS][SUBMIT] Check submission request parameters ## What changes were proposed in this pull request? PR closed with all the comments -> https://github.com/apache/spark/pull/19793 It solves the problem when submitting a wrong CreateSubmissionRequest to Spark Dispatcher was causing a bad state of Dispatcher and making it inactive as a Mesos framework. https://issues.apache.org/jira/browse/SPARK-22574 ## How was this patch tested? All spark test passed successfully. It was tested sending a wrong request (without appArgs) before and after the change. The point is easy, check if the value is null before being accessed. This was before the change, leaving the dispatcher inactive: ``` Exception in thread "Thread-22" java.lang.NullPointerException at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.getDriverCommandValue(MesosClusterScheduler.scala:444) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.buildDriverCommand(MesosClusterScheduler.scala:451) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.org$apache$spark$scheduler$cluster$mesos$MesosClusterScheduler$$createTaskInfo(MesosClusterScheduler.scala:538) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:570) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:555) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:555) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:621) ``` And after: ``` "message" : "Malformed request: org.apache.spark.deploy.rest.SubmitRestProtocolException: Validation of message CreateSubmissionRequest failed!\n\torg.apache.spark.deploy.rest.SubmitRestProtocolMessage.validate(SubmitRestProtocolMessage.scala:70)\n\torg.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:272)\n\tjavax.servlet.http.HttpServlet.service(HttpServlet.java:707)\n\tjavax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\torg.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)\n\torg.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)\n\torg.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\torg.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\torg.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\torg.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\torg.spark _project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\torg.spark_project.jetty.server.Server.handle(Server.java:524)\n\torg.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:319)\n\torg.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)\n\torg.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\torg.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\torg.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\torg.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\torg.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\torg.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\torg.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java :671)\n\torg.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tjava.lang.Thread.run(Thread.java:745)" ``` Author: German SchiavonCloses #19966 from Gschiavon/fix-submission-request. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0bdb4e51 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0bdb4e51 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0bdb4e51 Branch: refs/heads/master Commit: 0bdb4e516c425ea7bf941106ac6449b5a0a289e3 Parents: 1abcbed Author: German Schiavon Authored: Wed Dec 13 13:37:25 2017 -0800 Committer: Marcelo Vanzin Committed: Wed Dec 13 13:37:25 2017 -0800 -- .../spark/deploy/rest/SubmitRestProtocolRequest.scala | 2 ++ .../spark/deploy/rest/SubmitRestProtocolSuite.scala| 2
spark git commit: [SPARK-22574][MESOS][SUBMIT] Check submission request parameters
Repository: spark Updated Branches: refs/heads/branch-2.2 0230515a2 -> b4f4be396 [SPARK-22574][MESOS][SUBMIT] Check submission request parameters ## What changes were proposed in this pull request? PR closed with all the comments -> https://github.com/apache/spark/pull/19793 It solves the problem when submitting a wrong CreateSubmissionRequest to Spark Dispatcher was causing a bad state of Dispatcher and making it inactive as a Mesos framework. https://issues.apache.org/jira/browse/SPARK-22574 ## How was this patch tested? All spark test passed successfully. It was tested sending a wrong request (without appArgs) before and after the change. The point is easy, check if the value is null before being accessed. This was before the change, leaving the dispatcher inactive: ``` Exception in thread "Thread-22" java.lang.NullPointerException at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.getDriverCommandValue(MesosClusterScheduler.scala:444) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.buildDriverCommand(MesosClusterScheduler.scala:451) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.org$apache$spark$scheduler$cluster$mesos$MesosClusterScheduler$$createTaskInfo(MesosClusterScheduler.scala:538) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:570) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:555) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:555) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:621) ``` And after: ``` "message" : "Malformed request: org.apache.spark.deploy.rest.SubmitRestProtocolException: Validation of message CreateSubmissionRequest failed!\n\torg.apache.spark.deploy.rest.SubmitRestProtocolMessage.validate(SubmitRestProtocolMessage.scala:70)\n\torg.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:272)\n\tjavax.servlet.http.HttpServlet.service(HttpServlet.java:707)\n\tjavax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\torg.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)\n\torg.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)\n\torg.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\torg.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\torg.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\torg.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\torg.spark _project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\torg.spark_project.jetty.server.Server.handle(Server.java:524)\n\torg.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:319)\n\torg.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)\n\torg.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\torg.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\torg.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\torg.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\torg.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\torg.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\torg.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java :671)\n\torg.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tjava.lang.Thread.run(Thread.java:745)" ``` Author: German SchiavonCloses #19966 from Gschiavon/fix-submission-request. (cherry picked from commit 0bdb4e516c425ea7bf941106ac6449b5a0a289e3) Signed-off-by: Marcelo Vanzin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b4f4be39 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b4f4be39 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b4f4be39 Branch: refs/heads/branch-2.2 Commit: b4f4be396b76e8d3f583193c70fc6b26c99231ac Parents: 0230515 Author: German Schiavon Authored: Wed Dec 13 13:37:25 2017 -0800 Committer: Marcelo Vanzin Committed: Wed Dec 13 13:37:35 2017 -0800
svn commit: r23716 - in /dev/spark/2.3.0-SNAPSHOT-2017_12_13_12_01-1abcbed-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Dec 13 20:14:36 2017 New Revision: 23716 Log: Apache Spark 2.3.0-SNAPSHOT-2017_12_13_12_01-1abcbed docs [This commit notification would consist of 1407 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-22763][CORE] SHS: Ignore unknown events and parse through the file
Repository: spark Updated Branches: refs/heads/master c5a4701ac -> 1abcbed67 [SPARK-22763][CORE] SHS: Ignore unknown events and parse through the file ## What changes were proposed in this pull request? While spark code changes, there are new events in event log: #19649 And we used to maintain a whitelist to avoid exceptions: #15663 Currently Spark history server will stop parsing on unknown events or unrecognized properties. We may still see part of the UI data. For better compatibility, we can ignore unknown events and parse through the log file. ## How was this patch tested? Unit test Author: Wang GengliangCloses #19953 from gengliangwang/ReplayListenerBus. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1abcbed6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1abcbed6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1abcbed6 Branch: refs/heads/master Commit: 1abcbed678c2bc4f05640db2791fd2d84267d740 Parents: c5a4701 Author: Wang Gengliang Authored: Wed Dec 13 11:54:22 2017 -0800 Committer: gatorsmile Committed: Wed Dec 13 11:54:22 2017 -0800 -- .../spark/scheduler/ReplayListenerBus.scala | 37 ++-- .../spark/scheduler/ReplayListenerSuite.scala | 29 +++ 2 files changed, 47 insertions(+), 19 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1abcbed6/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala -- diff --git a/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala b/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala index 26a6a3e..c9cd662 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala @@ -69,6 +69,8 @@ private[spark] class ReplayListenerBus extends SparkListenerBus with Logging { eventsFilter: ReplayEventsFilter): Unit = { var currentLine: String = null var lineNumber: Int = 0 +val unrecognizedEvents = new scala.collection.mutable.HashSet[String] +val unrecognizedProperties = new scala.collection.mutable.HashSet[String] try { val lineEntries = lines @@ -84,16 +86,22 @@ private[spark] class ReplayListenerBus extends SparkListenerBus with Logging { postToAll(JsonProtocol.sparkEventFromJson(parse(currentLine))) } catch { - case e: ClassNotFoundException if KNOWN_REMOVED_CLASSES.contains(e.getMessage) => -// Ignore events generated by Structured Streaming in Spark 2.0.0 and 2.0.1. -// It's safe since no place uses them. -logWarning(s"Dropped incompatible Structured Streaming log: $currentLine") - case e: UnrecognizedPropertyException if e.getMessage != null && e.getMessage.startsWith( -"Unrecognized field \"queryStatus\" " + - "(class org.apache.spark.sql.streaming.StreamingQueryListener$") => -// Ignore events generated by Structured Streaming in Spark 2.0.2 -// It's safe since no place uses them. -logWarning(s"Dropped incompatible Structured Streaming log: $currentLine") + case e: ClassNotFoundException => +// Ignore unknown events, parse through the event log file. +// To avoid spamming, warnings are only displayed once for each unknown event. +if (!unrecognizedEvents.contains(e.getMessage)) { + logWarning(s"Drop unrecognized event: ${e.getMessage}") + unrecognizedEvents.add(e.getMessage) +} +logDebug(s"Drop incompatible event log: $currentLine") + case e: UnrecognizedPropertyException => +// Ignore unrecognized properties, parse through the event log file. +// To avoid spamming, warnings are only displayed once for each unrecognized property. +if (!unrecognizedProperties.contains(e.getMessage)) { + logWarning(s"Drop unrecognized property: ${e.getMessage}") + unrecognizedProperties.add(e.getMessage) +} +logDebug(s"Drop incompatible event log: $currentLine") case jpe: JsonParseException => // We can only ignore exception from last line of the file that might be truncated // the last entry may not be the very last line in the event log, but we treat it @@ -125,13 +133,4 @@ private[spark] object ReplayListenerBus { // utility filter that selects all event logs during replay val SELECT_ALL_FILTER: ReplayEventsFilter = { (eventString: String) => true } - - /** - * Classes that were removed. Structured
spark git commit: Revert "[SPARK-21417][SQL] Infer join conditions using propagated constraints"
Repository: spark Updated Branches: refs/heads/master 8eb5609d8 -> c5a4701ac Revert "[SPARK-21417][SQL] Infer join conditions using propagated constraints" This reverts commit 6ac57fd0d1c82b834eb4bf0dd57596b92a99d6de. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c5a4701a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c5a4701a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c5a4701a Branch: refs/heads/master Commit: c5a4701acc6972ed7ccb11c506fe718d5503f140 Parents: 8eb5609 Author: gatorsmileAuthored: Wed Dec 13 11:50:04 2017 -0800 Committer: gatorsmile Committed: Wed Dec 13 11:50:04 2017 -0800 -- .../expressions/EquivalentExpressionMap.scala | 66 - .../catalyst/expressions/ExpressionSet.scala| 2 - .../sql/catalyst/optimizer/Optimizer.scala | 1 - .../spark/sql/catalyst/optimizer/joins.scala| 60 - .../EquivalentExpressionMapSuite.scala | 56 - .../optimizer/EliminateCrossJoinSuite.scala | 238 --- 6 files changed, 423 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c5a4701a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressionMap.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressionMap.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressionMap.scala deleted file mode 100644 index cf1614a..000 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressionMap.scala +++ /dev/null @@ -1,66 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.catalyst.expressions - -import scala.collection.mutable - -import org.apache.spark.sql.catalyst.expressions.EquivalentExpressionMap.SemanticallyEqualExpr - -/** - * A class that allows you to map an expression into a set of equivalent expressions. The keys are - * handled based on their semantic meaning and ignoring cosmetic differences. The values are - * represented as [[ExpressionSet]]s. - * - * The underlying representation of keys depends on the [[Expression.semanticHash]] and - * [[Expression.semanticEquals]] methods. - * - * {{{ - * val map = new EquivalentExpressionMap() - * - * map.put(1 + 2, a) - * map.put(rand(), b) - * - * map.get(2 + 1) => Set(a) // 1 + 2 and 2 + 1 are semantically equivalent - * map.get(1 + 2) => Set(a) // 1 + 2 and 2 + 1 are semantically equivalent - * map.get(rand()) => Set() // non-deterministic expressions are not equivalent - * }}} - */ -class EquivalentExpressionMap { - - private val equivalenceMap = mutable.HashMap.empty[SemanticallyEqualExpr, ExpressionSet] - - def put(expression: Expression, equivalentExpression: Expression): Unit = { -val equivalentExpressions = equivalenceMap.getOrElseUpdate(expression, ExpressionSet.empty) -equivalenceMap(expression) = equivalentExpressions + equivalentExpression - } - - def get(expression: Expression): Set[Expression] = -equivalenceMap.getOrElse(expression, ExpressionSet.empty) -} - -object EquivalentExpressionMap { - - private implicit class SemanticallyEqualExpr(val expr: Expression) { -override def equals(obj: Any): Boolean = obj match { - case other: SemanticallyEqualExpr => expr.semanticEquals(other.expr) - case _ => false -} - -override def hashCode: Int = expr.semanticHash() - } -} http://git-wip-us.apache.org/repos/asf/spark/blob/c5a4701a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala index e989083..7e8e7b8 100644 ---
spark git commit: [SPARK-22754][DEPLOY] Check whether spark.executor.heartbeatInterval bigger…
Repository: spark Updated Branches: refs/heads/master f6bcd3e53 -> 8eb5609d8 [SPARK-22754][DEPLOY] Check whether spark.executor.heartbeatInterval bigger⦠⦠than spark.network.timeout or not ## What changes were proposed in this pull request? If spark.executor.heartbeatInterval bigger than spark.network.timeout,it will almost always cause exception below. `Job aborted due to stage failure: Task 4763 in stage 3.0 failed 4 times, most recent failure: Lost task 4763.3 in stage 3.0 (TID 22383, executor id: 4761, host: xxx): ExecutorLostFailure (executor 4761 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 154022 ms` Since many users do not get that point.He will set spark.executor.heartbeatInterval incorrectly. This patch check this case when submit applications. ## How was this patch tested? Test in cluster Author: zhoukangCloses #19942 from caneGuy/zhoukang/check-heartbeat. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8eb5609d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8eb5609d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8eb5609d Branch: refs/heads/master Commit: 8eb5609d8d961e54aa1ed0632f15f5e570fa627a Parents: f6bcd3e Author: zhoukang Authored: Wed Dec 13 11:47:33 2017 -0800 Committer: Marcelo Vanzin Committed: Wed Dec 13 11:47:33 2017 -0800 -- core/src/main/scala/org/apache/spark/SparkConf.scala | 8 core/src/test/scala/org/apache/spark/SparkConfSuite.scala | 10 ++ 2 files changed, 18 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8eb5609d/core/src/main/scala/org/apache/spark/SparkConf.scala -- diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala b/core/src/main/scala/org/apache/spark/SparkConf.scala index 4b1286d..d77303e 100644 --- a/core/src/main/scala/org/apache/spark/SparkConf.scala +++ b/core/src/main/scala/org/apache/spark/SparkConf.scala @@ -564,6 +564,14 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || get(SASL_ENCRYPTION_ENABLED) require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED), s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.") + +val executorTimeoutThreshold = getTimeAsSeconds("spark.network.timeout", "120s") +val executorHeartbeatInterval = getTimeAsSeconds("spark.executor.heartbeatInterval", "10s") +// If spark.executor.heartbeatInterval bigger than spark.network.timeout, +// it will almost always cause ExecutorLostFailure. See SPARK-22754. +require(executorTimeoutThreshold > executorHeartbeatInterval, "The value of " + + s"spark.network.timeout=${executorTimeoutThreshold}s must be no less than the value of " + + s"spark.executor.heartbeatInterval=${executorHeartbeatInterval}s.") } /** http://git-wip-us.apache.org/repos/asf/spark/blob/8eb5609d/core/src/test/scala/org/apache/spark/SparkConfSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/SparkConfSuite.scala b/core/src/test/scala/org/apache/spark/SparkConfSuite.scala index c771eb4..bff808e 100644 --- a/core/src/test/scala/org/apache/spark/SparkConfSuite.scala +++ b/core/src/test/scala/org/apache/spark/SparkConfSuite.scala @@ -329,6 +329,16 @@ class SparkConfSuite extends SparkFunSuite with LocalSparkContext with ResetSyst conf.validateSettings() } + test("spark.network.timeout should bigger than spark.executor.heartbeatInterval") { +val conf = new SparkConf() +conf.validateSettings() + +conf.set("spark.network.timeout", "5s") +intercept[IllegalArgumentException] { + conf.validateSettings() +} + } + } class Class1 {} - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ScalaUDF
Repository: spark Updated Branches: refs/heads/master 58f7c825a -> f6bcd3e53 [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ScalaUDF ## What changes were proposed in this pull request? We should not operate on `references` directly in `Expression.doGenCode`, instead we should use the high-level API `addReferenceObj`. ## How was this patch tested? existing tests Author: Wenchen FanCloses #19962 from cloud-fan/codegen. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f6bcd3e5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f6bcd3e5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f6bcd3e5 Branch: refs/heads/master Commit: f6bcd3e53fe55bab38383cce8d0340f6c6191972 Parents: 58f7c82 Author: Wenchen Fan Authored: Thu Dec 14 01:16:44 2017 +0800 Committer: Wenchen Fan Committed: Thu Dec 14 01:16:44 2017 +0800 -- .../sql/catalyst/expressions/ScalaUDF.scala | 76 ++-- .../sql/catalyst/expressions/predicates.scala | 20 ++ .../catalyst/expressions/ScalaUDFSuite.scala| 3 +- .../catalyst/optimizer/OptimizeInSuite.scala| 2 +- 4 files changed, 28 insertions(+), 73 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f6bcd3e5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala index a3cf761..388ef42 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala @@ -76,11 +76,6 @@ case class ScalaUDF( }.foreach(println) */ - - // Accessors used in genCode - def userDefinedFunc(): AnyRef = function - def getChildren(): Seq[Expression] = children - private[this] val f = children.size match { case 0 => val func = function.asInstanceOf[() => Any] @@ -981,41 +976,19 @@ case class ScalaUDF( } // scalastyle:on line.size.limit - - private val converterClassName = classOf[Any => Any].getName - private val scalaUDFClassName = classOf[ScalaUDF].getName - private val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" - - // Generate codes used to convert the arguments to Scala type for user-defined functions - private[this] def genCodeForConverter(ctx: CodegenContext, index: Int): (String, String) = { -val converterTerm = ctx.freshName("converter") -val expressionIdx = ctx.references.size - 1 -(converterTerm, - s"$converterClassName $converterTerm = ($converterClassName)$typeConvertersClassName" + -s".createToScalaConverter(((Expression)((($scalaUDFClassName)" + - s"references[$expressionIdx]).getChildren().apply($index))).dataType());") - } - override def doGenCode( ctx: CodegenContext, ev: ExprCode): ExprCode = { -val scalaUDF = ctx.freshName("scalaUDF") -val scalaUDFRef = ctx.addReferenceObj("scalaUDFRef", this, scalaUDFClassName) - -// Object to convert the returned value of user-defined functions to Catalyst type -val catalystConverterTerm = ctx.freshName("catalystConverter") +val converterClassName = classOf[Any => Any].getName +// The type converters for inputs and the result. +val converters: Array[Any => Any] = children.map { c => + CatalystTypeConverters.createToScalaConverter(c.dataType) +}.toArray :+ CatalystTypeConverters.createToCatalystConverter(dataType) +val convertersTerm = ctx.addReferenceObj("converters", converters, s"$converterClassName[]") +val errorMsgTerm = ctx.addReferenceObj("errMsg", udfErrorMessage) val resultTerm = ctx.freshName("result") -// This must be called before children expressions' codegen -// because ctx.references is used in genCodeForConverter -val converterTerms = children.indices.map(genCodeForConverter(ctx, _)) - -// Initialize user-defined function -val funcClassName = s"scala.Function${children.size}" - -val funcTerm = ctx.freshName("udf") - // codegen for children expressions val evals = children.map(_.genCode(ctx)) @@ -1023,38 +996,31 @@ case class ScalaUDF( // We need to get the boxedType of dataType's javaType here. Because for the dataType // such as IntegerType, its javaType is `int` and the returned type of user-defined // function is Object. Trying to convert an Object to `int` will cause casting exception. -val evalCode = evals.map(_.code).mkString -val
svn commit: r23711 - in /dev/spark/2.3.0-SNAPSHOT-2017_12_13_08_02-58f7c82-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Dec 13 16:22:46 2017 New Revision: 23711 Log: Apache Spark 2.3.0-SNAPSHOT-2017_12_13_08_02-58f7c82 docs [This commit notification would consist of 1407 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20849][DOC][FOLLOWUP] Document R DecisionTree - Link Classification Example
Repository: spark Updated Branches: refs/heads/master 7453ab024 -> 58f7c825a [SPARK-20849][DOC][FOLLOWUP] Document R DecisionTree - Link Classification Example ## What changes were proposed in this pull request? in https://github.com/apache/spark/pull/18067, only the regression example is linked this pr link decision tree classification example to the doc ping felixcheung ## How was this patch tested? local build of docs ![default](https://user-images.githubusercontent.com/7322292/33922857-9b00fdd0-e008-11e7-92c2-85a3de52ea8f.png) Author: Zheng RuiFengCloses #19963 from zhengruifeng/r_examples. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/58f7c825 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/58f7c825 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/58f7c825 Branch: refs/heads/master Commit: 58f7c825ae9dd2e3f548a9b7b4a9704f970dde5b Parents: 7453ab0 Author: Zheng RuiFeng Authored: Wed Dec 13 07:52:21 2017 -0600 Committer: Sean Owen Committed: Wed Dec 13 07:52:21 2017 -0600 -- docs/ml-classification-regression.md | 8 1 file changed, 8 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/58f7c825/docs/ml-classification-regression.md -- diff --git a/docs/ml-classification-regression.md b/docs/ml-classification-regression.md index 083df2e..bf979f3 100644 --- a/docs/ml-classification-regression.md +++ b/docs/ml-classification-regression.md @@ -223,6 +223,14 @@ More details on parameters can be found in the [Python API documentation](api/py + + +Refer to the [R API docs](api/R/spark.decisionTree.html) for more details. + +{% include_example classification r/ml/decisionTree.R %} + + + ## Random forest classifier - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r23709 - in /dev/spark/2.3.0-SNAPSHOT-2017_12_13_04_01-7453ab0-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Dec 13 12:19:38 2017 New Revision: 23709 Log: Apache Spark 2.3.0-SNAPSHOT-2017_12_13_04_01-7453ab0 docs [This commit notification would consist of 1407 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-22745][SQL] read partition stats from Hive
Repository: spark Updated Branches: refs/heads/master 682eb4f2e -> 7453ab024 [SPARK-22745][SQL] read partition stats from Hive ## What changes were proposed in this pull request? Currently Spark can read table stats (e.g. `totalSize, numRows`) from Hive, we can also support to read partition stats from Hive using the same logic. ## How was this patch tested? Added a new test case and modified an existing test case. Author: Zhenhua WangAuthor: Zhenhua Wang Closes #19932 from wzhfy/read_hive_partition_stats. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7453ab02 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7453ab02 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7453ab02 Branch: refs/heads/master Commit: 7453ab0243dc04db1b586b0e5f588f9cdc9f72dd Parents: 682eb4f Author: Zhenhua Wang Authored: Wed Dec 13 16:27:29 2017 +0800 Committer: Wenchen Fan Committed: Wed Dec 13 16:27:29 2017 +0800 -- .../spark/sql/hive/HiveExternalCatalog.scala| 6 +- .../spark/sql/hive/client/HiveClientImpl.scala | 66 +++- .../apache/spark/sql/hive/StatisticsSuite.scala | 32 +++--- 3 files changed, 62 insertions(+), 42 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7453ab02/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala index 44e680d..632e3e0 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala @@ -668,7 +668,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat val schema = restoreTableMetadata(rawTable).schema // convert table statistics to properties so that we can persist them through hive client -var statsProperties = +val statsProperties = if (stats.isDefined) { statsToProperties(stats.get, schema) } else { @@ -1098,14 +1098,14 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat val schema = restoreTableMetadata(rawTable).schema // convert partition statistics to properties so that we can persist them through hive api -val withStatsProps = lowerCasedParts.map(p => { +val withStatsProps = lowerCasedParts.map { p => if (p.stats.isDefined) { val statsProperties = statsToProperties(p.stats.get, schema) p.copy(parameters = p.parameters ++ statsProperties) } else { p } -}) +} // Note: Before altering table partitions in Hive, you *must* set the current database // to the one that contains the table of interest. Otherwise you will end up with the http://git-wip-us.apache.org/repos/asf/spark/blob/7453ab02/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala index 08eb5c7..7233944 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala @@ -414,32 +414,6 @@ private[hive] class HiveClientImpl( } val comment = properties.get("comment") - // Here we are reading statistics from Hive. - // Note that this statistics could be overridden by Spark's statistics if that's available. - val totalSize = properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_)) - val rawDataSize = properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_)) - val rowCount = properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_)) - // TODO: check if this estimate is valid for tables after partition pruning. - // NOTE: getting `totalSize` directly from params is kind of hacky, but this should be - // relatively cheap if parameters for the table are populated into the metastore. - // Currently, only totalSize, rawDataSize, and rowCount are used to build the field `stats` - // TODO: stats should include all the other two fields (`numFiles` and `numPartitions`). - // (see StatsSetupConst in Hive) - val stats = -// When table is external, `totalSize` is always zero, which will influence join strategy. -// So when `totalSize` is zero,
svn commit: r23703 - in /dev/spark/2.3.0-SNAPSHOT-2017_12_13_00_01-682eb4f-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Dec 13 08:15:04 2017 New Revision: 23703 Log: Apache Spark 2.3.0-SNAPSHOT-2017_12_13_00_01-682eb4f docs [This commit notification would consist of 1407 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org