spark git commit: [SPARK-22779][SQL] Resolve default values for fallback configs.

2017-12-13 Thread lixiao
Repository: spark
Updated Branches:
  refs/heads/master f8c7c1f21 -> c3dd2a26d


[SPARK-22779][SQL] Resolve default values for fallback configs.

SQLConf allows some callers to define a custom default value for
configs, and that complicates a little bit the handling of fallback
config entries, since most of the default value resolution is
hidden by the config code.

This change peaks into the internals of these fallback configs
to figure out the correct default value, and also returns the
current human-readable default when showing the default value
(e.g. through "set -v").

Author: Marcelo Vanzin 

Closes #19974 from vanzin/SPARK-22779.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c3dd2a26
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c3dd2a26
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c3dd2a26

Branch: refs/heads/master
Commit: c3dd2a26deaadf508b4e163eab2c0544cd922540
Parents: f8c7c1f
Author: Marcelo Vanzin 
Authored: Wed Dec 13 22:46:20 2017 -0800
Committer: gatorsmile 
Committed: Wed Dec 13 22:46:20 2017 -0800

--
 .../spark/internal/config/ConfigEntry.scala |  8 --
 .../org/apache/spark/sql/internal/SQLConf.scala | 16 ---
 .../spark/sql/internal/SQLConfSuite.scala   | 30 
 3 files changed, 47 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c3dd2a26/core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala 
b/core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala
index f119028..ede3ace 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala
@@ -139,7 +139,7 @@ private[spark] class OptionalConfigEntry[T](
 s => Some(rawValueConverter(s)),
 v => v.map(rawStringConverter).orNull, doc, isPublic) {
 
-  override def defaultValueString: String = ""
+  override def defaultValueString: String = ConfigEntry.UNDEFINED
 
   override def readFrom(reader: ConfigReader): Option[T] = {
 readString(reader).map(rawValueConverter)
@@ -149,12 +149,12 @@ private[spark] class OptionalConfigEntry[T](
 /**
  * A config entry whose default value is defined by another config entry.
  */
-private class FallbackConfigEntry[T] (
+private[spark] class FallbackConfigEntry[T] (
 key: String,
 alternatives: List[String],
 doc: String,
 isPublic: Boolean,
-private[config] val fallback: ConfigEntry[T])
+val fallback: ConfigEntry[T])
   extends ConfigEntry[T](key, alternatives,
 fallback.valueConverter, fallback.stringConverter, doc, isPublic) {
 
@@ -167,6 +167,8 @@ private class FallbackConfigEntry[T] (
 
 private[spark] object ConfigEntry {
 
+  val UNDEFINED = ""
+
   private val knownConfigs = new 
java.util.concurrent.ConcurrentHashMap[String, ConfigEntry[_]]()
 
   def registerEntry(entry: ConfigEntry[_]): Unit = {

http://git-wip-us.apache.org/repos/asf/spark/blob/c3dd2a26/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 1121444..cf7e3eb 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -1379,7 +1379,7 @@ class SQLConf extends Serializable with Logging {
 Option(settings.get(key)).
   orElse {
 // Try to use the default value
-Option(sqlConfEntries.get(key)).map(_.defaultValueString)
+Option(sqlConfEntries.get(key)).map { e => 
e.stringConverter(e.readFrom(reader)) }
   }.
   getOrElse(throw new NoSuchElementException(key))
   }
@@ -1417,14 +1417,21 @@ class SQLConf extends Serializable with Logging {
* not set yet, return `defaultValue`.
*/
   def getConfString(key: String, defaultValue: String): String = {
-if (defaultValue != null && defaultValue != "") {
+if (defaultValue != null && defaultValue != ConfigEntry.UNDEFINED) {
   val entry = sqlConfEntries.get(key)
   if (entry != null) {
 // Only verify configs in the SQLConf object
 entry.valueConverter(defaultValue)
   }
 }
-Option(settings.get(key)).getOrElse(defaultValue)
+Option(settings.get(key)).getOrElse {
+  // If the key is not set, need to check whether the config entry is 
registered and is
+  // a 

[2/2] spark git commit: [SPARK-22732] Add Structured Streaming APIs to DataSourceV2

2017-12-13 Thread zsxwing
[SPARK-22732] Add Structured Streaming APIs to DataSourceV2

## What changes were proposed in this pull request?

This PR provides DataSourceV2 API support for structured streaming, including 
new pieces needed to support continuous processing [SPARK-20928]. High level 
summary:

- DataSourceV2 includes new mixins to support micro-batch and continuous reads 
and writes. For reads, we accept an optional user specified schema rather than 
using the ReadSupportWithSchema model, because doing so would severely 
complicate the interface.

- DataSourceV2Reader includes new interfaces to read a specific microbatch or 
read continuously from a given offset. These follow the same setter pattern as 
the existing Supports* mixins so that they can work with SupportsScanUnsafeRow.

- DataReader (the per-partition reader) has a new subinterface 
ContinuousDataReader only for continuous processing. This reader has a special 
method to check progress, and next() blocks for new input rather than returning 
false.

- Offset, an abstract representation of position in a streaming query, is 
ported to the public API. (Each type of reader will define its own Offset 
implementation.)

- DataSourceV2Writer has a new subinterface ContinuousWriter only for 
continuous processing. Commits to this interface come tagged with an epoch 
number, as the execution engine will continue to produce new epoch commits as 
the task continues indefinitely.

Note that this PR does not propose to change the existing DataSourceV2 batch 
API, or deprecate the existing streaming source/sink internal APIs in 
spark.sql.execution.streaming.

## How was this patch tested?

Toy implementations of the new interfaces with unit tests.

Author: Jose Torres 

Closes #19925 from joseph-torres/continuous-api.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f8c7c1f2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f8c7c1f2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f8c7c1f2

Branch: refs/heads/master
Commit: f8c7c1f21aa9d1fd38b584ca8c4adf397966e9f7
Parents: 1e44dd0
Author: Jose Torres 
Authored: Wed Dec 13 22:31:39 2017 -0800
Committer: Shixiong Zhu 
Committed: Wed Dec 13 22:31:39 2017 -0800

--
 .../apache/spark/sql/kafka010/KafkaSource.scala |   1 +
 .../spark/sql/kafka010/KafkaSourceOffset.scala  |   3 +-
 .../spark/sql/kafka010/KafkaSourceSuite.scala   |   1 +
 .../sql/sources/v2/ContinuousReadSupport.java   |  42 +
 .../sql/sources/v2/ContinuousWriteSupport.java  |  54 ++
 .../sql/sources/v2/DataSourceV2Options.java |   8 +
 .../sql/sources/v2/MicroBatchReadSupport.java   |  52 ++
 .../sql/sources/v2/MicroBatchWriteSupport.java  |  58 ++
 .../sources/v2/reader/ContinuousDataReader.java |  36 
 .../sql/sources/v2/reader/ContinuousReader.java |  68 +++
 .../sql/sources/v2/reader/MicroBatchReader.java |  64 +++
 .../spark/sql/sources/v2/reader/Offset.java |  60 +++
 .../sql/sources/v2/reader/PartitionOffset.java  |  30 
 .../sql/sources/v2/writer/ContinuousWriter.java |  41 +
 .../execution/streaming/BaseStreamingSink.java  |  27 +++
 .../streaming/BaseStreamingSource.java  |  37 
 .../execution/streaming/FileStreamSource.scala  |   1 +
 .../streaming/FileStreamSourceOffset.scala  |   3 +
 .../sql/execution/streaming/LongOffset.scala|   2 +
 .../spark/sql/execution/streaming/Offset.scala  |  34 +---
 .../sql/execution/streaming/OffsetSeq.scala |   1 +
 .../sql/execution/streaming/OffsetSeqLog.scala  |   1 +
 .../streaming/RateSourceProvider.scala  |  22 ++-
 .../execution/streaming/RateStreamOffset.scala  |  29 +++
 .../streaming/RateStreamSourceV2.scala  | 162 +
 .../spark/sql/execution/streaming/Source.scala  |   1 +
 .../execution/streaming/StreamExecution.scala   |   1 +
 .../execution/streaming/StreamProgress.scala|   2 +
 .../continuous/ContinuousRateStreamSource.scala | 152 
 .../spark/sql/execution/streaming/memory.scala  |   1 +
 .../sql/execution/streaming/memoryV2.scala  | 178 +++
 .../spark/sql/execution/streaming/socket.scala  |   1 +
 .../execution/streaming/MemorySinkV2Suite.scala |  82 +
 .../execution/streaming/OffsetSeqLogSuite.scala |   1 +
 .../execution/streaming/RateSourceSuite.scala   |   1 +
 .../execution/streaming/RateSourceV2Suite.scala | 155 
 .../sql/streaming/FileStreamSourceSuite.scala   |   1 +
 .../spark/sql/streaming/OffsetSuite.scala   |   3 +-
 .../spark/sql/streaming/StreamSuite.scala   |   1 +
 .../apache/spark/sql/streaming/StreamTest.scala |   1 +
 .../streaming/StreamingAggregationSuite.scala   |   1 +
 .../streaming/StreamingQueryListenerSuite.scala |   1 +
 .../sql/streaming/StreamingQuerySuite.scala |   1 +
 

[1/2] spark git commit: [SPARK-22732] Add Structured Streaming APIs to DataSourceV2

2017-12-13 Thread zsxwing
Repository: spark
Updated Branches:
  refs/heads/master 1e44dd004 -> f8c7c1f21


http://git-wip-us.apache.org/repos/asf/spark/blob/f8c7c1f2/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
index b6baaed..7a2d9e3 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
@@ -32,6 +32,7 @@ import org.apache.spark.sql.catalyst.util._
 import org.apache.spark.sql.execution.streaming._
 import org.apache.spark.sql.execution.streaming.FileStreamSource.{FileEntry, 
SeenFilesMap}
 import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources.v2.reader.Offset
 import org.apache.spark.sql.streaming.ExistsThrowsExceptionFileSystem._
 import org.apache.spark.sql.streaming.util.StreamManualClock
 import org.apache.spark.sql.test.SharedSQLContext

http://git-wip-us.apache.org/repos/asf/spark/blob/f8c7c1f2/sql/core/src/test/scala/org/apache/spark/sql/streaming/OffsetSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/OffsetSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/OffsetSuite.scala
index f208f9b..4297482 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/OffsetSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/OffsetSuite.scala
@@ -18,7 +18,8 @@
 package org.apache.spark.sql.streaming
 
 import org.apache.spark.SparkFunSuite
-import org.apache.spark.sql.execution.streaming.{LongOffset, Offset, 
SerializedOffset}
+import org.apache.spark.sql.execution.streaming.{LongOffset, SerializedOffset}
+import org.apache.spark.sql.sources.v2.reader.Offset
 
 trait OffsetSuite extends SparkFunSuite {
   /** Creates test to check all the comparisons of offsets given a `one` that 
is less than `two`. */

http://git-wip-us.apache.org/repos/asf/spark/blob/f8c7c1f2/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
index 3d687d2..8163a1f 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
@@ -39,6 +39,7 @@ import 
org.apache.spark.sql.execution.streaming.state.{StateStore, StateStoreCon
 import org.apache.spark.sql.functions._
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.sources.StreamSourceProvider
+import org.apache.spark.sql.sources.v2.reader.Offset
 import org.apache.spark.sql.streaming.util.StreamManualClock
 import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
 import org.apache.spark.util.Utils

http://git-wip-us.apache.org/repos/asf/spark/blob/f8c7c1f2/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala
index dc5b998..7a1ff89 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala
@@ -39,6 +39,7 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.catalyst.util._
 import org.apache.spark.sql.execution.streaming._
 import org.apache.spark.sql.execution.streaming.state.StateStore
+import org.apache.spark.sql.sources.v2.reader.Offset
 import org.apache.spark.sql.streaming.StreamingQueryListener._
 import org.apache.spark.sql.test.SharedSQLContext
 import org.apache.spark.util.{Clock, SystemClock, Utils}

http://git-wip-us.apache.org/repos/asf/spark/blob/f8c7c1f2/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
index 1b4d855..fa03135 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
@@ -34,6 +34,7 @@ import org.apache.spark.sql.execution.streaming._
 import 

spark git commit: [SPARK-3181][ML] Implement huber loss for LinearRegression.

2017-12-13 Thread yliang
Repository: spark
Updated Branches:
  refs/heads/master 2a29a60da -> 1e44dd004


[SPARK-3181][ML] Implement huber loss for LinearRegression.

## What changes were proposed in this pull request?
MLlib ```LinearRegression``` supports _huber_ loss addition to _leastSquares_ 
loss. The huber loss objective function is:
![image](https://user-images.githubusercontent.com/1962026/29554124-9544d198-8750-11e7-8afa-33579ec419d5.png)
Refer Eq.(6) and Eq.(8) in [A robust hybrid of lasso and ridge 
regression](http://statweb.stanford.edu/~owen/reports/hhu.pdf). This objective 
is jointly convex as a function of (w, σ) ∈ R × (0,∞), we can use 
L-BFGS-B to solve it.

The current implementation is a straight forward porting for Python 
scikit-learn 
[```HuberRegressor```](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html).
 There are some differences:
* We use mean loss (```lossSum/weightSum```), but sklearn uses total loss 
(```lossSum```).
* We multiply the loss function and L2 regularization by 1/2. It does not 
affect the result if we multiply the whole formula by a factor, we just keep 
consistent with _leastSquares_ loss.

So if fitting w/o regularization, MLlib and sklearn produce the same output. If 
fitting w/ regularization, MLlib should set ```regParam``` divide by the number 
of instances to match the output of sklearn.

## How was this patch tested?
Unit tests.

Author: Yanbo Liang 

Closes #19020 from yanboliang/spark-3181.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1e44dd00
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1e44dd00
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1e44dd00

Branch: refs/heads/master
Commit: 1e44dd004425040912f2cf16362d2c13f12e1689
Parents: 2a29a60
Author: Yanbo Liang 
Authored: Wed Dec 13 21:19:14 2017 -0800
Committer: Yanbo Liang 
Committed: Wed Dec 13 21:19:14 2017 -0800

--
 .../ml/optim/aggregator/HuberAggregator.scala   | 150 ++
 .../ml/param/shared/SharedParamsCodeGen.scala   |   3 +-
 .../spark/ml/param/shared/sharedParams.scala|  17 ++
 .../spark/ml/regression/LinearRegression.scala  | 299 +++
 .../optim/aggregator/HuberAggregatorSuite.scala | 170 +++
 .../ml/regression/LinearRegressionSuite.scala   | 244 ++-
 project/MimaExcludes.scala  |   5 +
 7 files changed, 823 insertions(+), 65 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1e44dd00/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala
new file mode 100644
index 000..13f64d2
--- /dev/null
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala
@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.ml.optim.aggregator
+
+import org.apache.spark.broadcast.Broadcast
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.ml.linalg.Vector
+
+/**
+ * HuberAggregator computes the gradient and loss for a huber loss function,
+ * as used in robust regression for samples in sparse or dense vector in an 
online fashion.
+ *
+ * The huber loss function based on:
+ * http://statweb.stanford.edu/~owen/reports/hhu.pdf;>Art B. Owen 
(2006),
+ * A robust hybrid of lasso and ridge regression.
+ *
+ * Two HuberAggregator can be merged together to have a summary of loss and 
gradient of
+ * the corresponding joint dataset.
+ *
+ * The huber loss function is given by
+ *
+ * 
+ *   $$
+ *   \begin{align}
+ *   \min_{w, \sigma}\frac{1}{2n}{\sum_{i=1}^n\left(\sigma +
+ *   H_m\left(\frac{X_{i}w - y_{i}}{\sigma}\right)\sigma\right) + 
\frac{1}{2}\lambda {||w||_2}^2}
+ *   \end{align}
+ *   $$
+ * 
+ *
+ * 

svn commit: r23723 - in /dev/spark/2.3.0-SNAPSHOT-2017_12_13_20_01-2a29a60-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2017-12-13 Thread pwendell
Author: pwendell
Date: Thu Dec 14 04:14:32 2017
New Revision: 23723

Log:
Apache Spark 2.3.0-SNAPSHOT-2017_12_13_20_01-2a29a60 docs


[This commit notification would consist of 1407 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[1/2] spark git commit: Revert "[SPARK-22600][SQL][FOLLOW-UP] Fix a compilation error in TPCDS q75/q77"

2017-12-13 Thread wenchen
Repository: spark
Updated Branches:
  refs/heads/master ef9299965 -> 2a29a60da


Revert "[SPARK-22600][SQL][FOLLOW-UP] Fix a compilation error in TPCDS q75/q77"

This reverts commit ef92999653f0e2a47752379a867647445d849aab.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bc7e4a90
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bc7e4a90
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bc7e4a90

Branch: refs/heads/master
Commit: bc7e4a90c0c91970a94aa385971daac48db6264e
Parents: ef92999
Author: Wenchen Fan 
Authored: Thu Dec 14 11:21:34 2017 +0800
Committer: Wenchen Fan 
Committed: Thu Dec 14 11:21:34 2017 +0800

--
 .../spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala | 1 +
 .../apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala   | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bc7e4a90/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
index 807cb94..a2dda48 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.expressions.codegen
 import scala.collection.mutable
 
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types.DataType
 
 /**
  * Defines util methods used in expression code generation.

http://git-wip-us.apache.org/repos/asf/spark/blob/bc7e4a90/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
index 634014a..c96ed6e 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
@@ -192,7 +192,7 @@ case class BroadcastHashJoinExec(
   |  $value = ${ev.value};
   |}
  """.stripMargin
-ExprCode(code, isNull, value, inputRow = matched)
+ExprCode(code, isNull, value)
   }
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[2/2] spark git commit: Revert "[SPARK-22600][SQL] Fix 64kb limit for deeply nested expressions under wholestage codegen"

2017-12-13 Thread wenchen
Revert "[SPARK-22600][SQL] Fix 64kb limit for deeply nested expressions under 
wholestage codegen"

This reverts commit c7d0148615c921dca782ee3785b5d0cd59e42262.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2a29a60d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2a29a60d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2a29a60d

Branch: refs/heads/master
Commit: 2a29a60da32a5ccd04cc7c5c22c4075b159389e3
Parents: bc7e4a9
Author: Wenchen Fan 
Authored: Thu Dec 14 11:22:23 2017 +0800
Committer: Wenchen Fan 
Committed: Thu Dec 14 11:22:23 2017 +0800

--
 .../sql/catalyst/expressions/Expression.scala   |  37 +--
 .../expressions/codegen/CodeGenerator.scala |  44 +--
 .../expressions/codegen/ExpressionCodegen.scala | 269 ---
 .../codegen/ExpressionCodegenSuite.scala| 220 ---
 .../spark/sql/execution/ColumnarBatchScan.scala |   5 +-
 .../sql/execution/WholeStageCodegenSuite.scala  |  23 +-
 6 files changed, 13 insertions(+), 585 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2a29a60d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
index 329ea5d..743782a 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
@@ -105,12 +105,6 @@ abstract class Expression extends TreeNode[Expression] {
   val isNull = ctx.freshName("isNull")
   val value = ctx.freshName("value")
   val eval = doGenCode(ctx, ExprCode("", isNull, value))
-  eval.isNull = if (this.nullable) eval.isNull else "false"
-
-  // Records current input row and variables of this expression.
-  eval.inputRow = ctx.INPUT_ROW
-  eval.inputVars = findInputVars(ctx, eval)
-
   reduceCodeSize(ctx, eval)
   if (eval.code.nonEmpty) {
 // Add `this` in the comment.
@@ -121,29 +115,9 @@ abstract class Expression extends TreeNode[Expression] {
 }
   }
 
-  /**
-   * Returns the input variables to this expression.
-   */
-  private def findInputVars(ctx: CodegenContext, eval: ExprCode): 
Seq[ExprInputVar] = {
-if (ctx.currentVars != null) {
-  this.collect {
-case b @ BoundReference(ordinal, _, _) if ctx.currentVars(ordinal) != 
null =>
-  ExprInputVar(exprCode = ctx.currentVars(ordinal),
-dataType = b.dataType, nullable = b.nullable)
-  }
-} else {
-  Seq.empty
-}
-  }
-
-  /**
-   * In order to prevent 64kb compile error, reducing the size of generated 
codes by
-   * separating it into a function if the size exceeds a threshold.
-   */
   private def reduceCodeSize(ctx: CodegenContext, eval: ExprCode): Unit = {
-lazy val funcParams = ExpressionCodegen.getExpressionInputParams(ctx, this)
-
-if (eval.code.trim.length > 1024 && funcParams.isDefined) {
+// TODO: support whole stage codegen too
+if (eval.code.trim.length > 1024 && ctx.INPUT_ROW != null && 
ctx.currentVars == null) {
   val setIsNull = if (eval.isNull != "false" && eval.isNull != "true") {
 val globalIsNull = ctx.freshName("globalIsNull")
 ctx.addMutableState(ctx.JAVA_BOOLEAN, globalIsNull)
@@ -158,12 +132,9 @@ abstract class Expression extends TreeNode[Expression] {
   val newValue = ctx.freshName("value")
 
   val funcName = ctx.freshName(nodeName)
-  val callParams = funcParams.map(_._1.mkString(", ")).get
-  val declParams = funcParams.map(_._2.mkString(", ")).get
-
   val funcFullName = ctx.addNewFunction(funcName,
 s"""
-   |private $javaType $funcName($declParams) {
+   |private $javaType $funcName(InternalRow ${ctx.INPUT_ROW}) {
|  ${eval.code.trim}
|  $setIsNull
|  return ${eval.value};
@@ -171,7 +142,7 @@ abstract class Expression extends TreeNode[Expression] {
""".stripMargin)
 
   eval.value = newValue
-  eval.code = s"$javaType $newValue = $funcFullName($callParams);"
+  eval.code = s"$javaType $newValue = $funcFullName(${ctx.INPUT_ROW});"
 }
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/2a29a60d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 

svn commit: r23721 - in /dev/spark/2.3.0-SNAPSHOT-2017_12_13_16_01-ef92999-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2017-12-13 Thread pwendell
Author: pwendell
Date: Thu Dec 14 00:14:38 2017
New Revision: 23721

Log:
Apache Spark 2.3.0-SNAPSHOT-2017_12_13_16_01-ef92999 docs


[This commit notification would consist of 1407 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-22600][SQL][FOLLOW-UP] Fix a compilation error in TPCDS q75/q77

2017-12-13 Thread lixiao
Repository: spark
Updated Branches:
  refs/heads/master a83e8e6c2 -> ef9299965


[SPARK-22600][SQL][FOLLOW-UP] Fix a compilation error in TPCDS q75/q77

## What changes were proposed in this pull request?
This pr fixed a compilation error of TPCDS `q75`/`q77`  caused by #19813;
```
  java.util.concurrent.ExecutionException: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
371, Column 16: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
371, Column 16: Expression "bhj_matched" is not an rvalue
  at 
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
  at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
  at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
  at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
```

## How was this patch tested?
Manually checked `q75`/`q77` can be properly compiled

Author: Takeshi Yamamuro 

Closes #19969 from maropu/SPARK-22600-FOLLOWUP.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ef929996
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ef929996
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ef929996

Branch: refs/heads/master
Commit: ef92999653f0e2a47752379a867647445d849aab
Parents: a83e8e6
Author: Takeshi Yamamuro 
Authored: Wed Dec 13 15:55:16 2017 -0800
Committer: gatorsmile 
Committed: Wed Dec 13 15:55:16 2017 -0800

--
 .../spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala | 1 -
 .../apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala   | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ef929996/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
index a2dda48..807cb94 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
@@ -20,7 +20,6 @@ package org.apache.spark.sql.catalyst.expressions.codegen
 import scala.collection.mutable
 
 import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.types.DataType
 
 /**
  * Defines util methods used in expression code generation.

http://git-wip-us.apache.org/repos/asf/spark/blob/ef929996/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
index c96ed6e..634014a 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
@@ -192,7 +192,7 @@ case class BroadcastHashJoinExec(
   |  $value = ${ev.value};
   |}
  """.stripMargin
-ExprCode(code, isNull, value)
+ExprCode(code, isNull, value, inputRow = matched)
   }
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-22764][CORE] Fix flakiness in SparkContextSuite.

2017-12-13 Thread irashid
Repository: spark
Updated Branches:
  refs/heads/master ba0e79f57 -> a83e8e6c2


[SPARK-22764][CORE] Fix flakiness in SparkContextSuite.

Use a semaphore to synchronize the tasks with the listener code
that is trying to cancel the job or stage, so that the listener
won't try to cancel a job or stage that has already finished.

Author: Marcelo Vanzin 

Closes #19956 from vanzin/SPARK-22764.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a83e8e6c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a83e8e6c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a83e8e6c

Branch: refs/heads/master
Commit: a83e8e6c223df8b819335cbabbfff9956942f2ad
Parents: ba0e79f
Author: Marcelo Vanzin 
Authored: Wed Dec 13 16:06:16 2017 -0600
Committer: Imran Rashid 
Committed: Wed Dec 13 16:06:16 2017 -0600

--
 .../scala/org/apache/spark/SparkContextSuite.scala   | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a83e8e6c/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
index 37fcc93..b30bd74 100644
--- a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
@@ -20,7 +20,7 @@ package org.apache.spark
 import java.io.File
 import java.net.{MalformedURLException, URI}
 import java.nio.charset.StandardCharsets
-import java.util.concurrent.TimeUnit
+import java.util.concurrent.{Semaphore, TimeUnit}
 
 import scala.concurrent.duration._
 
@@ -499,6 +499,7 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
   test("Cancelling stages/jobs with custom reasons.") {
 sc = new SparkContext(new 
SparkConf().setAppName("test").setMaster("local"))
 val REASON = "You shall not pass"
+val slices = 10
 
 val listener = new SparkListener {
   override def onTaskStart(taskStart: SparkListenerTaskStart): Unit = {
@@ -508,6 +509,7 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
   }
   sc.cancelStage(taskStart.stageId, REASON)
   SparkContextSuite.cancelStage = false
+  SparkContextSuite.semaphore.release(slices)
 }
   }
 
@@ -518,21 +520,25 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
   }
   sc.cancelJob(jobStart.jobId, REASON)
   SparkContextSuite.cancelJob = false
+  SparkContextSuite.semaphore.release(slices)
 }
   }
 }
 sc.addSparkListener(listener)
 
 for (cancelWhat <- Seq("stage", "job")) {
+  SparkContextSuite.semaphore.drainPermits()
   SparkContextSuite.isTaskStarted = false
   SparkContextSuite.cancelStage = (cancelWhat == "stage")
   SparkContextSuite.cancelJob = (cancelWhat == "job")
 
   val ex = intercept[SparkException] {
-sc.range(0, 1L).mapPartitions { x =>
-  org.apache.spark.SparkContextSuite.isTaskStarted = true
+sc.range(0, 1L, numSlices = slices).mapPartitions { x =>
+  SparkContextSuite.isTaskStarted = true
+  // Block waiting for the listener to cancel the stage or job.
+  SparkContextSuite.semaphore.acquire()
   x
-}.cartesian(sc.range(0, 10L))count()
+}.count()
   }
 
   ex.getCause() match {
@@ -636,4 +642,5 @@ object SparkContextSuite {
   @volatile var isTaskStarted = false
   @volatile var taskKilled = false
   @volatile var taskSucceeded = false
+  val semaphore = new Semaphore(0)
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-22772][SQL] Use splitExpressionsWithCurrentInputs to split codes in elt

2017-12-13 Thread lixiao
Repository: spark
Updated Branches:
  refs/heads/master 0bdb4e516 -> ba0e79f57


[SPARK-22772][SQL] Use splitExpressionsWithCurrentInputs to split codes in elt

## What changes were proposed in this pull request?

In SPARK-22550 which fixes 64KB JVM bytecode limit problem with elt, 
`buildCodeBlocks` is used to split codes. However, we should use 
`splitExpressionsWithCurrentInputs` because it considers both normal and 
wholestage codgen (it is not supported yet, so it simply doesn't split the 
codes).

## How was this patch tested?

Existing tests.

Author: Liang-Chi Hsieh 

Closes #19964 from viirya/SPARK-22772.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ba0e79f5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ba0e79f5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ba0e79f5

Branch: refs/heads/master
Commit: ba0e79f57caa279773fb014b7883ee5d69dd0a68
Parents: 0bdb4e5
Author: Liang-Chi Hsieh 
Authored: Wed Dec 13 13:54:16 2017 -0800
Committer: gatorsmile 
Committed: Wed Dec 13 13:54:16 2017 -0800

--
 .../expressions/codegen/CodeGenerator.scala |  2 +-
 .../expressions/stringExpressions.scala | 81 ++--
 2 files changed, 43 insertions(+), 40 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ba0e79f5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
index 257c3f1..b1d9311 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
@@ -878,7 +878,7 @@ class CodegenContext {
*
* @param expressions the codes to evaluate expressions.
*/
-  def buildCodeBlocks(expressions: Seq[String]): Seq[String] = {
+  private def buildCodeBlocks(expressions: Seq[String]): Seq[String] = {
 val blocks = new ArrayBuffer[String]()
 val blockBuilder = new StringBuilder()
 var length = 0

http://git-wip-us.apache.org/repos/asf/spark/blob/ba0e79f5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
index 47f0b57..8c4d2fd 100755
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
@@ -289,53 +289,56 @@ case class Elt(children: Seq[Expression])
 val index = indexExpr.genCode(ctx)
 val strings = stringExprs.map(_.genCode(ctx))
 val indexVal = ctx.freshName("index")
+val indexMatched = ctx.freshName("eltIndexMatched")
+
 val stringVal = ctx.freshName("stringVal")
+ctx.addMutableState(ctx.javaType(dataType), stringVal)
+
 val assignStringValue = strings.zipWithIndex.map { case (eval, index) =>
   s"""
-case ${index + 1}:
-  ${eval.code}
-  $stringVal = ${eval.isNull} ? null : ${eval.value};
-  break;
-  """
+ |if ($indexVal == ${index + 1}) {
+ |  ${eval.code}
+ |  $stringVal = ${eval.isNull} ? null : ${eval.value};
+ |  $indexMatched = true;
+ |  continue;
+ |}
+  """.stripMargin
 }
 
-val cases = ctx.buildCodeBlocks(assignStringValue)
-val codes = if (cases.length == 1) {
-  s"""
-UTF8String $stringVal = null;
-switch ($indexVal) {
-  ${cases.head}
-}
-   """
-} else {
-  var prevFunc = "null"
-  for (c <- cases.reverse) {
-val funcName = ctx.freshName("eltFunc")
-val funcBody = s"""
- private UTF8String $funcName(InternalRow ${ctx.INPUT_ROW}, int 
$indexVal) {
-   UTF8String $stringVal = null;
-   switch ($indexVal) {
- $c
- default:
-   return $prevFunc;
-   }
-   return $stringVal;
- }
-"""
-val fullFuncName = ctx.addNewFunction(funcName, funcBody)
-prevFunc = s"$fullFuncName(${ctx.INPUT_ROW}, $indexVal)"
-  }
-  s"UTF8String $stringVal = $prevFunc;"
-}
+val codes = ctx.splitExpressionsWithCurrentInputs(
+  expressions 

spark git commit: [SPARK-22574][MESOS][SUBMIT] Check submission request parameters

2017-12-13 Thread vanzin
Repository: spark
Updated Branches:
  refs/heads/master 1abcbed67 -> 0bdb4e516


[SPARK-22574][MESOS][SUBMIT] Check submission request parameters

## What changes were proposed in this pull request?

PR closed with all the comments -> https://github.com/apache/spark/pull/19793

It solves the problem when submitting a wrong CreateSubmissionRequest to Spark 
Dispatcher was causing a bad state of Dispatcher and making it inactive as a 
Mesos framework.

https://issues.apache.org/jira/browse/SPARK-22574

## How was this patch tested?

All spark test passed successfully.

It was tested sending a wrong request (without appArgs) before and after the 
change. The point is easy, check if the value is null before being accessed.

This was before the change, leaving the dispatcher inactive:

```
Exception in thread "Thread-22" java.lang.NullPointerException
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.getDriverCommandValue(MesosClusterScheduler.scala:444)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.buildDriverCommand(MesosClusterScheduler.scala:451)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.org$apache$spark$scheduler$cluster$mesos$MesosClusterScheduler$$createTaskInfo(MesosClusterScheduler.scala:538)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:570)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:555)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:555)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:621)
```

And after:

```
  "message" : "Malformed request: 
org.apache.spark.deploy.rest.SubmitRestProtocolException: Validation of message 
CreateSubmissionRequest 
failed!\n\torg.apache.spark.deploy.rest.SubmitRestProtocolMessage.validate(SubmitRestProtocolMessage.scala:70)\n\torg.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:272)\n\tjavax.servlet.http.HttpServlet.service(HttpServlet.java:707)\n\tjavax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\torg.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)\n\torg.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)\n\torg.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\torg.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\torg.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\torg.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\torg.spark
 
_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\torg.spark_project.jetty.server.Server.handle(Server.java:524)\n\torg.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:319)\n\torg.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)\n\torg.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\torg.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\torg.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\torg.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\torg.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\torg.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\torg.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java
 
:671)\n\torg.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tjava.lang.Thread.run(Thread.java:745)"
```

Author: German Schiavon 

Closes #19966 from Gschiavon/fix-submission-request.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0bdb4e51
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0bdb4e51
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0bdb4e51

Branch: refs/heads/master
Commit: 0bdb4e516c425ea7bf941106ac6449b5a0a289e3
Parents: 1abcbed
Author: German Schiavon 
Authored: Wed Dec 13 13:37:25 2017 -0800
Committer: Marcelo Vanzin 
Committed: Wed Dec 13 13:37:25 2017 -0800

--
 .../spark/deploy/rest/SubmitRestProtocolRequest.scala  |  2 ++
 .../spark/deploy/rest/SubmitRestProtocolSuite.scala|  2 

spark git commit: [SPARK-22574][MESOS][SUBMIT] Check submission request parameters

2017-12-13 Thread vanzin
Repository: spark
Updated Branches:
  refs/heads/branch-2.2 0230515a2 -> b4f4be396


[SPARK-22574][MESOS][SUBMIT] Check submission request parameters

## What changes were proposed in this pull request?

PR closed with all the comments -> https://github.com/apache/spark/pull/19793

It solves the problem when submitting a wrong CreateSubmissionRequest to Spark 
Dispatcher was causing a bad state of Dispatcher and making it inactive as a 
Mesos framework.

https://issues.apache.org/jira/browse/SPARK-22574

## How was this patch tested?

All spark test passed successfully.

It was tested sending a wrong request (without appArgs) before and after the 
change. The point is easy, check if the value is null before being accessed.

This was before the change, leaving the dispatcher inactive:

```
Exception in thread "Thread-22" java.lang.NullPointerException
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.getDriverCommandValue(MesosClusterScheduler.scala:444)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.buildDriverCommand(MesosClusterScheduler.scala:451)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.org$apache$spark$scheduler$cluster$mesos$MesosClusterScheduler$$createTaskInfo(MesosClusterScheduler.scala:538)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:570)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:555)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:555)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:621)
```

And after:

```
  "message" : "Malformed request: 
org.apache.spark.deploy.rest.SubmitRestProtocolException: Validation of message 
CreateSubmissionRequest 
failed!\n\torg.apache.spark.deploy.rest.SubmitRestProtocolMessage.validate(SubmitRestProtocolMessage.scala:70)\n\torg.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:272)\n\tjavax.servlet.http.HttpServlet.service(HttpServlet.java:707)\n\tjavax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\torg.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)\n\torg.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)\n\torg.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\torg.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\torg.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\torg.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\torg.spark
 
_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\torg.spark_project.jetty.server.Server.handle(Server.java:524)\n\torg.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:319)\n\torg.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)\n\torg.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\torg.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\torg.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\torg.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\torg.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\torg.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\torg.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java
 
:671)\n\torg.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tjava.lang.Thread.run(Thread.java:745)"
```

Author: German Schiavon 

Closes #19966 from Gschiavon/fix-submission-request.

(cherry picked from commit 0bdb4e516c425ea7bf941106ac6449b5a0a289e3)
Signed-off-by: Marcelo Vanzin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b4f4be39
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b4f4be39
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b4f4be39

Branch: refs/heads/branch-2.2
Commit: b4f4be396b76e8d3f583193c70fc6b26c99231ac
Parents: 0230515
Author: German Schiavon 
Authored: Wed Dec 13 13:37:25 2017 -0800
Committer: Marcelo Vanzin 
Committed: Wed Dec 13 13:37:35 2017 -0800


svn commit: r23716 - in /dev/spark/2.3.0-SNAPSHOT-2017_12_13_12_01-1abcbed-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2017-12-13 Thread pwendell
Author: pwendell
Date: Wed Dec 13 20:14:36 2017
New Revision: 23716

Log:
Apache Spark 2.3.0-SNAPSHOT-2017_12_13_12_01-1abcbed docs


[This commit notification would consist of 1407 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-22763][CORE] SHS: Ignore unknown events and parse through the file

2017-12-13 Thread lixiao
Repository: spark
Updated Branches:
  refs/heads/master c5a4701ac -> 1abcbed67


[SPARK-22763][CORE] SHS: Ignore unknown events and parse through the file

## What changes were proposed in this pull request?

While spark code changes, there are new events in event log: #19649
And we used to maintain a whitelist to avoid exceptions: #15663
Currently Spark history server will stop parsing on unknown events or 
unrecognized properties. We may still see part of the UI data.
For better compatibility, we can ignore unknown events and parse through the 
log file.

## How was this patch tested?
Unit test

Author: Wang Gengliang 

Closes #19953 from gengliangwang/ReplayListenerBus.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1abcbed6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1abcbed6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1abcbed6

Branch: refs/heads/master
Commit: 1abcbed678c2bc4f05640db2791fd2d84267d740
Parents: c5a4701
Author: Wang Gengliang 
Authored: Wed Dec 13 11:54:22 2017 -0800
Committer: gatorsmile 
Committed: Wed Dec 13 11:54:22 2017 -0800

--
 .../spark/scheduler/ReplayListenerBus.scala | 37 ++--
 .../spark/scheduler/ReplayListenerSuite.scala   | 29 +++
 2 files changed, 47 insertions(+), 19 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1abcbed6/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala 
b/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala
index 26a6a3e..c9cd662 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala
@@ -69,6 +69,8 @@ private[spark] class ReplayListenerBus extends 
SparkListenerBus with Logging {
   eventsFilter: ReplayEventsFilter): Unit = {
 var currentLine: String = null
 var lineNumber: Int = 0
+val unrecognizedEvents = new scala.collection.mutable.HashSet[String]
+val unrecognizedProperties = new scala.collection.mutable.HashSet[String]
 
 try {
   val lineEntries = lines
@@ -84,16 +86,22 @@ private[spark] class ReplayListenerBus extends 
SparkListenerBus with Logging {
 
   postToAll(JsonProtocol.sparkEventFromJson(parse(currentLine)))
 } catch {
-  case e: ClassNotFoundException if 
KNOWN_REMOVED_CLASSES.contains(e.getMessage) =>
-// Ignore events generated by Structured Streaming in Spark 2.0.0 
and 2.0.1.
-// It's safe since no place uses them.
-logWarning(s"Dropped incompatible Structured Streaming log: 
$currentLine")
-  case e: UnrecognizedPropertyException if e.getMessage != null && 
e.getMessage.startsWith(
-"Unrecognized field \"queryStatus\" " +
-  "(class org.apache.spark.sql.streaming.StreamingQueryListener$") 
=>
-// Ignore events generated by Structured Streaming in Spark 2.0.2
-// It's safe since no place uses them.
-logWarning(s"Dropped incompatible Structured Streaming log: 
$currentLine")
+  case e: ClassNotFoundException =>
+// Ignore unknown events, parse through the event log file.
+// To avoid spamming, warnings are only displayed once for each 
unknown event.
+if (!unrecognizedEvents.contains(e.getMessage)) {
+  logWarning(s"Drop unrecognized event: ${e.getMessage}")
+  unrecognizedEvents.add(e.getMessage)
+}
+logDebug(s"Drop incompatible event log: $currentLine")
+  case e: UnrecognizedPropertyException =>
+// Ignore unrecognized properties, parse through the event log 
file.
+// To avoid spamming, warnings are only displayed once for each 
unrecognized property.
+if (!unrecognizedProperties.contains(e.getMessage)) {
+  logWarning(s"Drop unrecognized property: ${e.getMessage}")
+  unrecognizedProperties.add(e.getMessage)
+}
+logDebug(s"Drop incompatible event log: $currentLine")
   case jpe: JsonParseException =>
 // We can only ignore exception from last line of the file that 
might be truncated
 // the last entry may not be the very last line in the event log, 
but we treat it
@@ -125,13 +133,4 @@ private[spark] object ReplayListenerBus {
 
   // utility filter that selects all event logs during replay
   val SELECT_ALL_FILTER: ReplayEventsFilter = { (eventString: String) => true }
-
-  /**
-   * Classes that were removed. Structured 

spark git commit: Revert "[SPARK-21417][SQL] Infer join conditions using propagated constraints"

2017-12-13 Thread lixiao
Repository: spark
Updated Branches:
  refs/heads/master 8eb5609d8 -> c5a4701ac


Revert "[SPARK-21417][SQL] Infer join conditions using propagated constraints"

This reverts commit 6ac57fd0d1c82b834eb4bf0dd57596b92a99d6de.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c5a4701a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c5a4701a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c5a4701a

Branch: refs/heads/master
Commit: c5a4701acc6972ed7ccb11c506fe718d5503f140
Parents: 8eb5609
Author: gatorsmile 
Authored: Wed Dec 13 11:50:04 2017 -0800
Committer: gatorsmile 
Committed: Wed Dec 13 11:50:04 2017 -0800

--
 .../expressions/EquivalentExpressionMap.scala   |  66 -
 .../catalyst/expressions/ExpressionSet.scala|   2 -
 .../sql/catalyst/optimizer/Optimizer.scala  |   1 -
 .../spark/sql/catalyst/optimizer/joins.scala|  60 -
 .../EquivalentExpressionMapSuite.scala  |  56 -
 .../optimizer/EliminateCrossJoinSuite.scala | 238 ---
 6 files changed, 423 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c5a4701a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressionMap.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressionMap.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressionMap.scala
deleted file mode 100644
index cf1614a..000
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressionMap.scala
+++ /dev/null
@@ -1,66 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.catalyst.expressions
-
-import scala.collection.mutable
-
-import 
org.apache.spark.sql.catalyst.expressions.EquivalentExpressionMap.SemanticallyEqualExpr
-
-/**
- * A class that allows you to map an expression into a set of equivalent 
expressions. The keys are
- * handled based on their semantic meaning and ignoring cosmetic differences. 
The values are
- * represented as [[ExpressionSet]]s.
- *
- * The underlying representation of keys depends on the 
[[Expression.semanticHash]] and
- * [[Expression.semanticEquals]] methods.
- *
- * {{{
- *   val map = new EquivalentExpressionMap()
- *
- *   map.put(1 + 2, a)
- *   map.put(rand(), b)
- *
- *   map.get(2 + 1) => Set(a) // 1 + 2 and 2 + 1 are semantically equivalent
- *   map.get(1 + 2) => Set(a) // 1 + 2 and 2 + 1 are semantically equivalent
- *   map.get(rand()) => Set() // non-deterministic expressions are not 
equivalent
- * }}}
- */
-class EquivalentExpressionMap {
-
-  private val equivalenceMap = mutable.HashMap.empty[SemanticallyEqualExpr, 
ExpressionSet]
-
-  def put(expression: Expression, equivalentExpression: Expression): Unit = {
-val equivalentExpressions = equivalenceMap.getOrElseUpdate(expression, 
ExpressionSet.empty)
-equivalenceMap(expression) = equivalentExpressions + equivalentExpression
-  }
-
-  def get(expression: Expression): Set[Expression] =
-equivalenceMap.getOrElse(expression, ExpressionSet.empty)
-}
-
-object EquivalentExpressionMap {
-
-  private implicit class SemanticallyEqualExpr(val expr: Expression) {
-override def equals(obj: Any): Boolean = obj match {
-  case other: SemanticallyEqualExpr => expr.semanticEquals(other.expr)
-  case _ => false
-}
-
-override def hashCode: Int = expr.semanticHash()
-  }
-}

http://git-wip-us.apache.org/repos/asf/spark/blob/c5a4701a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
index e989083..7e8e7b8 100644
--- 

spark git commit: [SPARK-22754][DEPLOY] Check whether spark.executor.heartbeatInterval bigger…

2017-12-13 Thread vanzin
Repository: spark
Updated Branches:
  refs/heads/master f6bcd3e53 -> 8eb5609d8


[SPARK-22754][DEPLOY] Check whether spark.executor.heartbeatInterval bigger…

… than spark.network.timeout or not

## What changes were proposed in this pull request?

If spark.executor.heartbeatInterval bigger than spark.network.timeout,it will 
almost always cause exception below.
`Job aborted due to stage failure: Task 4763 in stage 3.0 failed 4 times, most 
recent failure: Lost task 4763.3 in stage 3.0 (TID 22383, executor id: 4761, 
host: xxx): ExecutorLostFailure (executor 4761 exited caused by one of the 
running tasks) Reason: Executor heartbeat timed out after 154022 ms`
Since many users do not get that point.He will set 
spark.executor.heartbeatInterval incorrectly.
This patch check this case when submit applications.

## How was this patch tested?
Test in cluster

Author: zhoukang 

Closes #19942 from caneGuy/zhoukang/check-heartbeat.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8eb5609d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8eb5609d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8eb5609d

Branch: refs/heads/master
Commit: 8eb5609d8d961e54aa1ed0632f15f5e570fa627a
Parents: f6bcd3e
Author: zhoukang 
Authored: Wed Dec 13 11:47:33 2017 -0800
Committer: Marcelo Vanzin 
Committed: Wed Dec 13 11:47:33 2017 -0800

--
 core/src/main/scala/org/apache/spark/SparkConf.scala  |  8 
 core/src/test/scala/org/apache/spark/SparkConfSuite.scala | 10 ++
 2 files changed, 18 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8eb5609d/core/src/main/scala/org/apache/spark/SparkConf.scala
--
diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala 
b/core/src/main/scala/org/apache/spark/SparkConf.scala
index 4b1286d..d77303e 100644
--- a/core/src/main/scala/org/apache/spark/SparkConf.scala
+++ b/core/src/main/scala/org/apache/spark/SparkConf.scala
@@ -564,6 +564,14 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable 
with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.")
+
+val executorTimeoutThreshold = getTimeAsSeconds("spark.network.timeout", 
"120s")
+val executorHeartbeatInterval = 
getTimeAsSeconds("spark.executor.heartbeatInterval", "10s")
+// If spark.executor.heartbeatInterval bigger than spark.network.timeout,
+// it will almost always cause ExecutorLostFailure. See SPARK-22754.
+require(executorTimeoutThreshold > executorHeartbeatInterval, "The value 
of " +
+  s"spark.network.timeout=${executorTimeoutThreshold}s must be no less 
than the value of " +
+  s"spark.executor.heartbeatInterval=${executorHeartbeatInterval}s.")
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/8eb5609d/core/src/test/scala/org/apache/spark/SparkConfSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/SparkConfSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkConfSuite.scala
index c771eb4..bff808e 100644
--- a/core/src/test/scala/org/apache/spark/SparkConfSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkConfSuite.scala
@@ -329,6 +329,16 @@ class SparkConfSuite extends SparkFunSuite with 
LocalSparkContext with ResetSyst
 conf.validateSettings()
   }
 
+  test("spark.network.timeout should bigger than 
spark.executor.heartbeatInterval") {
+val conf = new SparkConf()
+conf.validateSettings()
+
+conf.set("spark.network.timeout", "5s")
+intercept[IllegalArgumentException] {
+  conf.validateSettings()
+}
+  }
+
 }
 
 class Class1 {}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ScalaUDF

2017-12-13 Thread wenchen
Repository: spark
Updated Branches:
  refs/heads/master 58f7c825a -> f6bcd3e53


[SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ScalaUDF

## What changes were proposed in this pull request?

We should not operate on `references` directly in `Expression.doGenCode`, 
instead we should use the high-level API `addReferenceObj`.

## How was this patch tested?

existing tests

Author: Wenchen Fan 

Closes #19962 from cloud-fan/codegen.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f6bcd3e5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f6bcd3e5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f6bcd3e5

Branch: refs/heads/master
Commit: f6bcd3e53fe55bab38383cce8d0340f6c6191972
Parents: 58f7c82
Author: Wenchen Fan 
Authored: Thu Dec 14 01:16:44 2017 +0800
Committer: Wenchen Fan 
Committed: Thu Dec 14 01:16:44 2017 +0800

--
 .../sql/catalyst/expressions/ScalaUDF.scala | 76 ++--
 .../sql/catalyst/expressions/predicates.scala   | 20 ++
 .../catalyst/expressions/ScalaUDFSuite.scala|  3 +-
 .../catalyst/optimizer/OptimizeInSuite.scala|  2 +-
 4 files changed, 28 insertions(+), 73 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f6bcd3e5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
index a3cf761..388ef42 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
@@ -76,11 +76,6 @@ case class ScalaUDF(
 }.foreach(println)
 
   */
-
-  // Accessors used in genCode
-  def userDefinedFunc(): AnyRef = function
-  def getChildren(): Seq[Expression] = children
-
   private[this] val f = children.size match {
 case 0 =>
   val func = function.asInstanceOf[() => Any]
@@ -981,41 +976,19 @@ case class ScalaUDF(
   }
 
   // scalastyle:on line.size.limit
-
-  private val converterClassName = classOf[Any => Any].getName
-  private val scalaUDFClassName = classOf[ScalaUDF].getName
-  private val typeConvertersClassName = 
CatalystTypeConverters.getClass.getName + ".MODULE$"
-
-  // Generate codes used to convert the arguments to Scala type for 
user-defined functions
-  private[this] def genCodeForConverter(ctx: CodegenContext, index: Int): 
(String, String) = {
-val converterTerm = ctx.freshName("converter")
-val expressionIdx = ctx.references.size - 1
-(converterTerm,
-  s"$converterClassName $converterTerm = 
($converterClassName)$typeConvertersClassName" +
-s".createToScalaConverter(((Expression)((($scalaUDFClassName)" +
-
s"references[$expressionIdx]).getChildren().apply($index))).dataType());")
-  }
-
   override def doGenCode(
   ctx: CodegenContext,
   ev: ExprCode): ExprCode = {
-val scalaUDF = ctx.freshName("scalaUDF")
-val scalaUDFRef = ctx.addReferenceObj("scalaUDFRef", this, 
scalaUDFClassName)
-
-// Object to convert the returned value of user-defined functions to 
Catalyst type
-val catalystConverterTerm = ctx.freshName("catalystConverter")
+val converterClassName = classOf[Any => Any].getName
 
+// The type converters for inputs and the result.
+val converters: Array[Any => Any] = children.map { c =>
+  CatalystTypeConverters.createToScalaConverter(c.dataType)
+}.toArray :+ CatalystTypeConverters.createToCatalystConverter(dataType)
+val convertersTerm = ctx.addReferenceObj("converters", converters, 
s"$converterClassName[]")
+val errorMsgTerm = ctx.addReferenceObj("errMsg", udfErrorMessage)
 val resultTerm = ctx.freshName("result")
 
-// This must be called before children expressions' codegen
-// because ctx.references is used in genCodeForConverter
-val converterTerms = children.indices.map(genCodeForConverter(ctx, _))
-
-// Initialize user-defined function
-val funcClassName = s"scala.Function${children.size}"
-
-val funcTerm = ctx.freshName("udf")
-
 // codegen for children expressions
 val evals = children.map(_.genCode(ctx))
 
@@ -1023,38 +996,31 @@ case class ScalaUDF(
 // We need to get the boxedType of dataType's javaType here. Because for 
the dataType
 // such as IntegerType, its javaType is `int` and the returned type of 
user-defined
 // function is Object. Trying to convert an Object to `int` will cause 
casting exception.
-val evalCode = evals.map(_.code).mkString
-val 

svn commit: r23711 - in /dev/spark/2.3.0-SNAPSHOT-2017_12_13_08_02-58f7c82-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2017-12-13 Thread pwendell
Author: pwendell
Date: Wed Dec 13 16:22:46 2017
New Revision: 23711

Log:
Apache Spark 2.3.0-SNAPSHOT-2017_12_13_08_02-58f7c82 docs


[This commit notification would consist of 1407 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-20849][DOC][FOLLOWUP] Document R DecisionTree - Link Classification Example

2017-12-13 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 7453ab024 -> 58f7c825a


[SPARK-20849][DOC][FOLLOWUP] Document R DecisionTree - Link Classification 
Example

## What changes were proposed in this pull request?
in https://github.com/apache/spark/pull/18067, only the regression example is 
linked

this pr link decision tree classification example to the doc

ping felixcheung

## How was this patch tested?
local build of docs

![default](https://user-images.githubusercontent.com/7322292/33922857-9b00fdd0-e008-11e7-92c2-85a3de52ea8f.png)

Author: Zheng RuiFeng 

Closes #19963 from zhengruifeng/r_examples.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/58f7c825
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/58f7c825
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/58f7c825

Branch: refs/heads/master
Commit: 58f7c825ae9dd2e3f548a9b7b4a9704f970dde5b
Parents: 7453ab0
Author: Zheng RuiFeng 
Authored: Wed Dec 13 07:52:21 2017 -0600
Committer: Sean Owen 
Committed: Wed Dec 13 07:52:21 2017 -0600

--
 docs/ml-classification-regression.md | 8 
 1 file changed, 8 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/58f7c825/docs/ml-classification-regression.md
--
diff --git a/docs/ml-classification-regression.md 
b/docs/ml-classification-regression.md
index 083df2e..bf979f3 100644
--- a/docs/ml-classification-regression.md
+++ b/docs/ml-classification-regression.md
@@ -223,6 +223,14 @@ More details on parameters can be found in the [Python API 
documentation](api/py
 
 
 
+
+
+Refer to the [R API docs](api/R/spark.decisionTree.html) for more details.
+
+{% include_example classification r/ml/decisionTree.R %}
+
+
+
 
 
 ## Random forest classifier


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r23709 - in /dev/spark/2.3.0-SNAPSHOT-2017_12_13_04_01-7453ab0-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2017-12-13 Thread pwendell
Author: pwendell
Date: Wed Dec 13 12:19:38 2017
New Revision: 23709

Log:
Apache Spark 2.3.0-SNAPSHOT-2017_12_13_04_01-7453ab0 docs


[This commit notification would consist of 1407 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-22745][SQL] read partition stats from Hive

2017-12-13 Thread wenchen
Repository: spark
Updated Branches:
  refs/heads/master 682eb4f2e -> 7453ab024


[SPARK-22745][SQL] read partition stats from Hive

## What changes were proposed in this pull request?

Currently Spark can read table stats (e.g. `totalSize, numRows`) from Hive, we 
can also support to read partition stats from Hive using the same logic.

## How was this patch tested?

Added a new test case and modified an existing test case.

Author: Zhenhua Wang 
Author: Zhenhua Wang 

Closes #19932 from wzhfy/read_hive_partition_stats.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7453ab02
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7453ab02
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7453ab02

Branch: refs/heads/master
Commit: 7453ab0243dc04db1b586b0e5f588f9cdc9f72dd
Parents: 682eb4f
Author: Zhenhua Wang 
Authored: Wed Dec 13 16:27:29 2017 +0800
Committer: Wenchen Fan 
Committed: Wed Dec 13 16:27:29 2017 +0800

--
 .../spark/sql/hive/HiveExternalCatalog.scala|  6 +-
 .../spark/sql/hive/client/HiveClientImpl.scala  | 66 +++-
 .../apache/spark/sql/hive/StatisticsSuite.scala | 32 +++---
 3 files changed, 62 insertions(+), 42 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7453ab02/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
index 44e680d..632e3e0 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
@@ -668,7 +668,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, 
hadoopConf: Configurat
 val schema = restoreTableMetadata(rawTable).schema
 
 // convert table statistics to properties so that we can persist them 
through hive client
-var statsProperties =
+val statsProperties =
   if (stats.isDefined) {
 statsToProperties(stats.get, schema)
   } else {
@@ -1098,14 +1098,14 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 val schema = restoreTableMetadata(rawTable).schema
 
 // convert partition statistics to properties so that we can persist them 
through hive api
-val withStatsProps = lowerCasedParts.map(p => {
+val withStatsProps = lowerCasedParts.map { p =>
   if (p.stats.isDefined) {
 val statsProperties = statsToProperties(p.stats.get, schema)
 p.copy(parameters = p.parameters ++ statsProperties)
   } else {
 p
   }
-})
+}
 
 // Note: Before altering table partitions in Hive, you *must* set the 
current database
 // to the one that contains the table of interest. Otherwise you will end 
up with the

http://git-wip-us.apache.org/repos/asf/spark/blob/7453ab02/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
index 08eb5c7..7233944 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
@@ -414,32 +414,6 @@ private[hive] class HiveClientImpl(
   }
   val comment = properties.get("comment")
 
-  // Here we are reading statistics from Hive.
-  // Note that this statistics could be overridden by Spark's statistics 
if that's available.
-  val totalSize = properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_))
-  val rawDataSize = 
properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_))
-  val rowCount = properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_))
-  // TODO: check if this estimate is valid for tables after partition 
pruning.
-  // NOTE: getting `totalSize` directly from params is kind of hacky, but 
this should be
-  // relatively cheap if parameters for the table are populated into the 
metastore.
-  // Currently, only totalSize, rawDataSize, and rowCount are used to 
build the field `stats`
-  // TODO: stats should include all the other two fields (`numFiles` and 
`numPartitions`).
-  // (see StatsSetupConst in Hive)
-  val stats =
-// When table is external, `totalSize` is always zero, which will 
influence join strategy.
-// So when `totalSize` is zero, 

svn commit: r23703 - in /dev/spark/2.3.0-SNAPSHOT-2017_12_13_00_01-682eb4f-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2017-12-13 Thread pwendell
Author: pwendell
Date: Wed Dec 13 08:15:04 2017
New Revision: 23703

Log:
Apache Spark 2.3.0-SNAPSHOT-2017_12_13_00_01-682eb4f docs


[This commit notification would consist of 1407 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org