[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23127
  
**[Test build #99275 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99275/testReport)**
 for PR 23127 at commit 
[`170073e`](https://github.com/apache/spark/commit/170073efd4d752a436bc18c4c0d2c0fc755d2d37).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23124
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23123: [SPARK-26153][ML] GBT & RandomForest avoid unnece...

2018-11-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23123


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23138: [SPARK-23356][SQL][TEST] add new test cases for a + 1,a ...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23138
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-26 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236236422
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
 val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
 
 def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
-  Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(index)
-  .first(0L)
-  .closeableIterator()
-  ) { it =>
-var last = Double.NaN
-var currentIdx = -1L
-indices.map { idx =>
-  if (idx == currentIdx) {
-last
-  } else {
-val diff = idx - currentIdx
-currentIdx = idx
-if (it.skip(diff - 1)) {
-  last = fn(it.next()).toDouble
-  last
-} else {
-  Double.NaN
-}
-  }
-}.toIndexedSeq
+  val quantileTasks = store.view(classOf[TaskDataWrapper])
--- End diff --

CC @vanzin for the proposed new method.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23138: [SPARK-23356][SQL][TEST] add new test cases for a + 1,a ...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23138
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99272/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21412: [SPARK-18805][DStream] Avoid StackOverflowError while ge...

2018-11-26 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21412
  
Agree, this PR should be closed. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23144: [SPARK-26172][ML][WIP] Unify String Params' case-insensi...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23144
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...

2018-11-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23124#discussion_r236275822
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -546,33 +546,29 @@ case class MapConcat(children: Seq[Expression]) 
extends ComplexTypeMergingExpres
 
   override def nullable: Boolean = children.exists(_.nullable)
 
+  private lazy val mapBuilder = new ArrayBasedMapBuilder(dataType.keyType, 
dataType.valueType)
+
   override def eval(input: InternalRow): Any = {
-val maps = children.map(_.eval(input))
+val maps = children.map(_.eval(input).asInstanceOf[MapData]).toArray
--- End diff --

`toArray` is O(N) but we do it only once. If accessing by index is O(N), 
the total time complexity is O(N ^ 2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21412: [SPARK-18805][DStream] Avoid StackOverflowError w...

2018-11-26 Thread chermenin
Github user chermenin closed the pull request at:

https://github.com/apache/spark/pull/21412


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21688
  
**[Test build #4442 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4442/testReport)**
 for PR 21688 at commit 
[`8f6efbd`](https://github.com/apache/spark/commit/8f6efbda4ad6b7fc4c86f4a2ae43e49847243826).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23126: [SPARK-26158] [MLLIB] fix covariance accuracy problem fo...

2018-11-26 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/23126
  
Is there a simple test case you can add to cover that too? that would 
really prove this change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-26 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue:

https://github.com/apache/spark/pull/23127
  
@cloud-fan @rednaxelafx
Actually, the input to a codegen stage can be an internal row so I can't 
make the inputRDD be `RDD[UnsafeRow], but the output needs to be UnsafeRow.
Doing it like `InputAdapter` did actually make it just pass-through output 
the internal row.
For `InputAdapter`, there always is some parent operator to consume it, and 
create an unsafe projection in whatever it does, and then the output UnsafeRows.
But for an `RDDScanExec` or `RowDataSourceScanExec` could be alone in a 
WholeStageCodegenExec, and then just doing `${consume(ctx, null, row)}` made it 
pass-through output the InternalRow from input.

WDYT about how I patched it up?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23126: [SPARK-26158] [MLLIB] fix covariance accuracy problem fo...

2018-11-26 Thread KyleLi1985
Github user KyleLi1985 commented on the issue:

https://github.com/apache/spark/pull/23126
  
Ok, I will do it later


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23142
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99268/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-11-26 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23130#discussion_r236226380
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
@@ -388,7 +388,7 @@ case class FileSourceScanExec(
 logInfo(s"Planning with ${bucketSpec.numBuckets} buckets")
 val filesGroupedToBuckets =
   selectedPartitions.flatMap { p =>
-p.files.map { f =>
+p.files.filter(_.getLen > 0).map { f =>
--- End diff --

I personally prefer filter + map as it's shorter and clearer. I don't know 
if one is faster; two transformations vs having to return Some/None. For a 
Dataset operation I'd favor one operation, but this is just local Scala code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullW...

2018-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23139
  
**[Test build #99270 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99270/testReport)**
 for PR 23139 at commit 
[`e416810`](https://github.com/apache/spark/commit/e41681096867cbc6d2556da83ce733092d6df841).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-11-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23130#discussion_r236233074
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
@@ -388,7 +388,7 @@ case class FileSourceScanExec(
 logInfo(s"Planning with ${bucketSpec.numBuckets} buckets")
 val filesGroupedToBuckets =
   selectedPartitions.flatMap { p =>
-p.files.map { f =>
+p.files.filter(_.getLen > 0).map { f =>
--- End diff --

It's non-critical path in terms of performance. Should be okay.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23143: [SPARK-24762][SQL][Followup] Enable Option of Pro...

2018-11-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23143#discussion_r236267692
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
 ---
@@ -251,19 +251,15 @@ case class ExpressionEncoder[T](
*/
   def isSerializedAsStruct: Boolean = 
objSerializer.dataType.isInstanceOf[StructType]
 
-  /**
-   * Returns true if the type `T` is an `Option` type.
-   */
-  def isOptionType: Boolean = 
classOf[Option[_]].isAssignableFrom(clsTag.runtimeClass)
-
   /**
* If the type `T` is serialized as a struct, when it is encoded to a 
Spark SQL row, fields in
* the struct are naturally mapped to top-level columns in a row. In 
other words, the serialized
* struct is flattened to row. But in case of the `T` is also an 
`Option` type, it can't be
* flattened to top-level row, because in Spark SQL top-level row can't 
be null. This method
* returns true if `T` is serialized as struct and is not `Option` type.
*/
-  def isSerializedAsStructForTopLevel: Boolean = isSerializedAsStruct && 
!isOptionType
+  def isSerializedAsStructForTopLevel: Boolean = isSerializedAsStruct &&
--- End diff --

nit: since it cross lines, I'd prefer
```
def xxx = {
  xxx
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-11-26 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21688
  
Test this please




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23126: [SPARK-26158] [MLLIB] fix covariance accuracy problem fo...

2018-11-26 Thread KyleLi1985
Github user KyleLi1985 commented on the issue:

https://github.com/apache/spark/pull/23126
  
Um, the unit test in spark indeed cover both case. But there is function 
closeToZero to handle accuracy problem, so..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23127
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23123: [SPARK-26153][ML] GBT & RandomForest avoid unnecessary `...

2018-11-26 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/23123
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23138: [SPARK-23356][SQL][TEST] add new test cases for a + 1,a ...

2018-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23138
  
**[Test build #99272 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99272/testReport)**
 for PR 23138 at commit 
[`471d114`](https://github.com/apache/spark/commit/471d1144d41f767b3227d78b663eaa79efef738c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23143: [SPARK-24762][SQL][Followup] Enable Option of Product en...

2018-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23143
  
**[Test build #99273 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99273/testReport)**
 for PR 23143 at commit 
[`f6bfe3d`](https://github.com/apache/spark/commit/f6bfe3d66774c5b8a61123dd79d4d3d49464df5f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23144: [SPARK-26172][ML][WIP] Unify String Params' case-insensi...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23144
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5357/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23144: [SPARK-26172][ML][WIP] Unify String Params' case-insensi...

2018-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23144
  
**[Test build #99274 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99274/testReport)**
 for PR 23144 at commit 
[`e55244a`](https://github.com/apache/spark/commit/e55244afa41e959f99c02f1afbe916fc7d1ffec3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23127
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5358/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullW...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23139
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullW...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23139
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99270/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22991: [SPARK-25989][ML] OneVsRestModel handle empty out...

2018-11-26 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22991#discussion_r236230624
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
@@ -209,6 +215,9 @@ final class OneVsRestModel private[ml] (
   newDataset.unpersist()
 }
 
+var outputColNames = Seq.empty[String]
--- End diff --

Maybe 'predictionColumns' ? These aren't the only output columns. You could 
make this a mutable val too, but whatever.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22163
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99271/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22163
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23126: [SPARK-26158] [MLLIB] fix covariance accuracy problem fo...

2018-11-26 Thread KyleLi1985
Github user KyleLi1985 commented on the issue:

https://github.com/apache/spark/pull/23126
  
It would be better, update the commit


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...

2018-11-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23124#discussion_r236282401
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala
 ---
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types.{AtomicType, CalendarIntervalType, 
DataType, MapType}
+
+/**
+ * A builder of [[ArrayBasedMapData]], which fails if a null map key is 
detected, and removes
+ * duplicated map keys w.r.t. the last wins policy.
+ */
+class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends 
Serializable {
+  assert(!keyType.existsRecursively(_.isInstanceOf[MapType]), "key of map 
cannot be/contain map")
+
+  private lazy val keyToIndex = keyType match {
+case _: AtomicType | _: CalendarIntervalType => 
mutable.HashMap.empty[Any, Int]
+case _ =>
+  // for complex types, use interpreted ordering to be able to compare 
unsafe data with safe
+  // data, e.g. UnsafeRow vs GenericInternalRow.
+  mutable.TreeMap.empty[Any, 
Int](TypeUtils.getInterpretedOrdering(keyType))
+  }
+
+  // TODO: specialize it
+  private lazy val keys = mutable.ArrayBuffer.empty[Any]
+  private lazy val values = mutable.ArrayBuffer.empty[Any]
+
+  private lazy val keyGetter = InternalRow.getAccessor(keyType)
+  private lazy val valueGetter = InternalRow.getAccessor(valueType)
+
+  def reset(): Unit = {
+keyToIndex.clear()
+keys.clear()
+values.clear()
+  }
+
+  def put(key: Any, value: Any): Unit = {
+if (key == null) {
+  throw new RuntimeException("Cannot use null as map key.")
+}
+
+val maybeExistingIdx = keyToIndex.get(key)
+if (maybeExistingIdx.isDefined) {
+  // Overwrite the previous value, as the policy is last wins.
+  values(maybeExistingIdx.get) = value
+} else {
+  keyToIndex.put(key, values.length)
+  keys.append(key)
+  values.append(value)
+}
+  }
+
+  // write a 2-field row, the first field is key and the second field is 
value.
+  def put(entry: InternalRow): Unit = {
+if (entry.isNullAt(0)) {
+  throw new RuntimeException("Cannot use null as map key.")
+}
+put(keyGetter(entry, 0), valueGetter(entry, 1))
+  }
+
+  def putAll(keyArray: Array[Any], valueArray: Array[Any]): Unit = {
+if (keyArray.length != valueArray.length) {
+  throw new RuntimeException(
+"The key array and value array of MapData must have the same 
length.")
+}
+
+var i = 0
+while (i < keyArray.length) {
+  put(keyArray(i), valueArray(i))
+  i += 1
+}
+  }
+
+  def putAll(keyArray: ArrayData, valueArray: ArrayData): Unit = {
+if (keyArray.numElements() != valueArray.numElements()) {
+  throw new RuntimeException(
+"The key array and value array of MapData must have the same 
length.")
+}
+
+var i = 0
+while (i < keyArray.numElements()) {
+  put(keyGetter(keyArray, i), valueGetter(valueArray, i))
+  i += 1
+}
+  }
+
+  def build(): ArrayBasedMapData = {
+new ArrayBasedMapData(new GenericArrayData(keys.toArray), new 
GenericArrayData(values.toArray))
+  }
+
+  def from(keyArray: ArrayData, valueArray: ArrayData): ArrayBasedMapData 
= {
+assert(keyToIndex.isEmpty, "'from' can only be called with a fresh 
GenericMapBuilder.")
+putAll(keyArray, valueArray)
--- End diff --

no we can't, as we still need to detect duplicated keys.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[GitHub] spark issue #23126: [SPARK-26158] [MLLIB] fix covariance accuracy problem fo...

2018-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23126
  
**[Test build #4441 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4441/testReport)**
 for PR 23126 at commit 
[`6fca901`](https://github.com/apache/spark/commit/6fca901dfdad230fa7a4e1079a3f48be826276a3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23143: [SPARK-24762][SQL][Followup] Enable Option of Pro...

2018-11-26 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/23143#discussion_r236295606
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
 ---
@@ -251,19 +251,15 @@ case class ExpressionEncoder[T](
*/
   def isSerializedAsStruct: Boolean = 
objSerializer.dataType.isInstanceOf[StructType]
 
-  /**
-   * Returns true if the type `T` is an `Option` type.
-   */
-  def isOptionType: Boolean = 
classOf[Option[_]].isAssignableFrom(clsTag.runtimeClass)
-
   /**
* If the type `T` is serialized as a struct, when it is encoded to a 
Spark SQL row, fields in
* the struct are naturally mapped to top-level columns in a row. In 
other words, the serialized
* struct is flattened to row. But in case of the `T` is also an 
`Option` type, it can't be
* flattened to top-level row, because in Spark SQL top-level row can't 
be null. This method
* returns true if `T` is serialized as struct and is not `Option` type.
*/
-  def isSerializedAsStructForTopLevel: Boolean = isSerializedAsStruct && 
!isOptionType
+  def isSerializedAsStructForTopLevel: Boolean = isSerializedAsStruct &&
--- End diff --

ok. fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23145: [MINOR][Docs] "a R interpreter" -> "an R interpreter"

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23145
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage ...

2018-11-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/23117
  
Hey @shaneknapp, have you found some time to take a look for this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.me...

2018-11-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/23055
  
Let me get this in in few days if there are no more comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23138: [SPARK-23356][SQL][TEST] add new test cases for a...

2018-11-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23138#discussion_r236334056
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SetOperationSuite.scala
 ---
@@ -196,4 +196,31 @@ class SetOperationSuite extends PlanTest {
   ))
 comparePlans(expectedPlan, rewrittenPlan)
   }
+
+  test("SPARK-23356 union: expressions with number in project list are 
pushed down") {
--- End diff --

`expressions with number` -> `expressions with literal`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23143: [SPARK-24762][SQL][Followup] Enable Option of Product en...

2018-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23143
  
**[Test build #99277 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99277/testReport)**
 for PR 23143 at commit 
[`f3f0507`](https://github.com/apache/spark/commit/f3f0507f6797c571ce66086d9e3b7fcbfd234c79).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23145: [MINOR][Docs] "a R interpreter" -> "an R interpreter"

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23145
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23145: [MINOR][Docs] "a R interpreter" -> "an R interpreter"

2018-11-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/23145
  
It's okay but mind taking another look the documentation and fix other 
typos while we are here? I'm pretty sure there are more.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.me...

2018-11-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/23055
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.me...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23055
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5361/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23145: [MINOR][Docs] "a R interpreter" -> "an R interpreter"

2018-11-26 Thread kjmrknsn
Github user kjmrknsn commented on the issue:

https://github.com/apache/spark/pull/23145
  
OK, I'll check the whole documentation later. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23146: [SPARK-26173] [MLlib] Prior regularization for Logistic ...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23146
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23146: [SPARK-26173] [MLlib] Prior regularization for Logistic ...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23146
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23127: [SPARK-26159] Codegen for LocalTableScanExec and ...

2018-11-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23127#discussion_r236332511
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -406,14 +415,62 @@ trait BlockingOperatorWithCodegen extends 
CodegenSupport {
   override def limitNotReachedChecks: Seq[String] = Nil
 }
 
+/**
+ * Leaf codegen node reading from a single RDD.
+ */
+trait InputRDDCodegen extends CodegenSupport {
+
+  def inputRDD: RDD[InternalRow]
+
+  // If the input is an RDD of InternalRow which are potentially not 
UnsafeRow,
+  // and there is no parent to consume it, it needs an UnsafeProjection.
+  protected val createUnsafeProjection: Boolean = (parent == null)
+
+  override def inputRDDs(): Seq[RDD[InternalRow]] = {
+inputRDD :: Nil
+  }
+
+  override def doProduce(ctx: CodegenContext): String = {
--- End diff --

can you highlight the difference between this one and the previous 
`RowDataSourceScanExec.doProduce`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23141: [SPARK-26021][SQL][followup] add test for special floati...

2018-11-26 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23141
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23145: [MINOR][Docs] "a R interpreter" -> "an R interpre...

2018-11-26 Thread kjmrknsn
GitHub user kjmrknsn opened a pull request:

https://github.com/apache/spark/pull/23145

[MINOR][Docs] "a R interpreter" -> "an R interpreter"

## What changes were proposed in this pull request?
This PR changes the phrase of `a R interpreter` to `an R interpreter` on 
`docs/index.md`.

## How was this patch tested?
NA

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kjmrknsn/spark docUpdate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23145.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23145


commit 9f2f60b3b80b1ae63c9df38cbc3d5bc1010d4123
Author: Keiji Yoshida 
Date:   2018-11-26T15:29:16Z

[MINOR][Docs] "a R interpreter" -> "an R interpreter"




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23145: [MINOR][Docs] "a R interpreter" -> "an R interpreter"

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23145
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.me...

2018-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23055
  
**[Test build #99278 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99278/testReport)**
 for PR 23055 at commit 
[`fd92a4e`](https://github.com/apache/spark/commit/fd92a4e1cee9f666d7ee6f9c9fcb45367c8132a8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22939: [SPARK-25446][R] Add schema_of_json() and schema_of_csv(...

2018-11-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22939
  
Hey @felixcheung, have you found some time to take a look for this please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.me...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23055
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23146: [SPARK-26173] [MLlib] Prior regularization for Lo...

2018-11-26 Thread elfausto
GitHub user elfausto opened a pull request:

https://github.com/apache/spark/pull/23146

[SPARK-26173] [MLlib] Prior regularization for Logistic Regression

Implementation of 
[SPARK-26173](https://issues.apache.org/jira/browse/SPARK-26173).
Unit tests have been added to the modified classes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/affectv/spark SPARK-26173

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23146.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23146


commit d8c35bb897cc0e2248810bfc7208e642deb5c55d
Author: Facundo Bellosi 
Date:   2018-11-26T15:31:26Z

SPARK-26173. Prior regularization for Logistic Regression.

commit 52918b7bd30f8fdccce450d7f41e34f8bedf32ad
Author: Facundo Bellosi 
Date:   2018-11-26T15:50:11Z

SPARK-26173. Prior regularization for Logistic Regression.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23141: [SPARK-26021][SQL][followup] add test for special floati...

2018-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23141
  
**[Test build #99279 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99279/testReport)**
 for PR 23141 at commit 
[`7c590bc`](https://github.com/apache/spark/commit/7c590bcd239afb32537d607ee8b2190f4e4db5a3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23146: [SPARK-26173] [MLlib] Prior regularization for Logistic ...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23146
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23141: [SPARK-26021][SQL][followup] add test for special floati...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23141
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23127: [SPARK-26159] Codegen for LocalTableScanExec and ...

2018-11-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23127#discussion_r236332786
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -406,14 +415,62 @@ trait BlockingOperatorWithCodegen extends 
CodegenSupport {
   override def limitNotReachedChecks: Seq[String] = Nil
 }
 
+/**
+ * Leaf codegen node reading from a single RDD.
+ */
+trait InputRDDCodegen extends CodegenSupport {
+
+  def inputRDD: RDD[InternalRow]
+
+  // If the input is an RDD of InternalRow which are potentially not 
UnsafeRow,
+  // and there is no parent to consume it, it needs an UnsafeProjection.
+  protected val createUnsafeProjection: Boolean = (parent == null)
+
+  override def inputRDDs(): Seq[RDD[InternalRow]] = {
+inputRDD :: Nil
+  }
+
+  override def doProduce(ctx: CodegenContext): String = {
--- End diff --

the extra `if createUnsafeProjection` check?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23124
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5359/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...

2018-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23124
  
**[Test build #99276 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99276/testReport)**
 for PR 23124 at commit 
[`b2bfb33`](https://github.com/apache/spark/commit/b2bfb3353f89be03fc6b12e4e8dd0899e3510a09).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23143: [SPARK-24762][SQL][Followup] Enable Option of Product en...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23143
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23143: [SPARK-24762][SQL][Followup] Enable Option of Product en...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23143
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5360/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23141: [SPARK-26021][SQL][followup] add test for special floati...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23141
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5362/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23127: [SPARK-26159] Codegen for LocalTableScanExec and ...

2018-11-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23127#discussion_r236333530
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -406,14 +415,62 @@ trait BlockingOperatorWithCodegen extends 
CodegenSupport {
   override def limitNotReachedChecks: Seq[String] = Nil
 }
 
+/**
+ * Leaf codegen node reading from a single RDD.
+ */
+trait InputRDDCodegen extends CodegenSupport {
+
+  def inputRDD: RDD[InternalRow]
+
+  // If the input is an RDD of InternalRow which are potentially not 
UnsafeRow,
+  // and there is no parent to consume it, it needs an UnsafeProjection.
+  protected val createUnsafeProjection: Boolean = (parent == null)
+
+  override def inputRDDs(): Seq[RDD[InternalRow]] = {
+inputRDD :: Nil
+  }
+
+  override def doProduce(ctx: CodegenContext): String = {
+// Inline mutable state since an InputRDDCodegen is used once in a 
task for WholeStageCodegen
+val input = ctx.addMutableState("scala.collection.Iterator", "input", 
v => s"$v = inputs[0];",
+  forceInline = true)
+val row = ctx.freshName("row")
+
+val outputVars = if (createUnsafeProjection) {
+  // creating the vars will make the parent consume add an unsafe 
projection.
+  ctx.INPUT_ROW = row
+  ctx.currentVars = null
+  output.zipWithIndex.map { case (a, i) =>
+BoundReference(i, a.dataType, a.nullable).genCode(ctx)
+  }
+} else {
+  null
+}
+
+val numOutputRowsCode = if (metrics.contains("numOutputRows")) {
--- End diff --

how about `updateNumOutputRowsMetrics` instead of `numOutputRowsCode`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23102: [SPARK-26137][CORE] Use Java system property "fil...

2018-11-26 Thread markpavey
Github user markpavey commented on a diff in the pull request:

https://github.com/apache/spark/pull/23102#discussion_r236340228
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala 
---
@@ -65,7 +65,7 @@ private[deploy] object DependencyUtils extends Logging {
   .map {
 resolveGlobPaths(_, hadoopConf)
   .split(",")
-  .filterNot(_.contains(userJar.split("/").last))
+  
.filterNot(_.contains(userJar.split(System.getProperty("file.separator")).last))
--- End diff --

I have updated to use File.separator instead of 
System.getProperty("file.separator").


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23127
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23127
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5369/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23127
  
**[Test build #99286 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99286/testReport)**
 for PR 23127 at commit 
[`5d94c42`](https://github.com/apache/spark/commit/5d94c4274d468f3873db0827c88e1461f0125b58).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-11-26 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21688
  
+1 , going to merge to master

There are a few followup jiras on this.
1) make the timeline visualization better: 
https://issues.apache.org/jira/browse/SPARK-26130
2) improve search functionalit: 
https://issues.apache.org/jira/browse/SPARK-25719


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23058: [SPARK-25905][CORE] When getting a remote block, avoid f...

2018-11-26 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/23058
  
lgtm

I looked more into the lifecycle of the buffers and when they get 
`disposed`, and it looks fine to me.  (In fact I think there is no need for the 
`dispose` in the first place, as hinted at here: 
https://github.com/apache/spark/pull/22511#issuecomment-424429691)

I also checked about whether we should buffer the input stream, but 
`dataDeserializeStream` already does that.

@wypoon one thing, can you update the testing section of the pr description 
to mention the coverage you found in the existing unit tests?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23109: [SPARK-26069][TESTS][FOLLOWUP]Add another possible error...

2018-11-26 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/23109
  
lte review, but lgtm anyway


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-11-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21688


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-26 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236409557
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
 val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
 
 def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
-  Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(index)
-  .first(0L)
-  .closeableIterator()
-  ) { it =>
-var last = Double.NaN
-var currentIdx = -1L
-indices.map { idx =>
-  if (idx == currentIdx) {
-last
-  } else {
-val diff = idx - currentIdx
-currentIdx = idx
-if (it.skip(diff - 1)) {
-  last = fn(it.next()).toDouble
-  last
-} else {
-  Double.NaN
-}
-  }
-}.toIndexedSeq
+  val quantileTasks = store.view(classOf[TaskDataWrapper])
--- End diff --

Thank you @vanzin for the review. I will check the time with large number 
of tasks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23088
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5370/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23111: [SPARK-26148][PYTHON][TESTS] Increases default paralleli...

2018-11-26 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/23111
  
wow, thats great!  glad there is a big speedup.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-26 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236409661
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
 val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
 
 def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
-  Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(index)
-  .first(0L)
-  .closeableIterator()
-  ) { it =>
-var last = Double.NaN
-var currentIdx = -1L
-indices.map { idx =>
-  if (idx == currentIdx) {
-last
-  } else {
-val diff = idx - currentIdx
-currentIdx = idx
-if (it.skip(diff - 1)) {
-  last = fn(it.next()).toDouble
-  last
-} else {
-  Double.NaN
-}
-  }
-}.toIndexedSeq
+  val quantileTasks = store.view(classOf[TaskDataWrapper])
+.parent(stageKey)
+.index(index)
+.first(0L)
+.asScala
+.filter(_.status == "SUCCESS") // Filter "SUCCESS" tasks
+.zipWithIndex
+.filter(x => indices.contains(x._2))
--- End diff --

Thanks. I have modified


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23088
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23100: [SPARK-26133][ML] Remove deprecated OneHotEncoder...

2018-11-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/23100#discussion_r236410306
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala ---
@@ -17,126 +17,512 @@
 
 package org.apache.spark.ml.feature
 
+import org.apache.hadoop.fs.Path
--- End diff --

I guess once the commits of the history are squashed into one, it will 
still like this without better history.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23100: [SPARK-26133][ML] Remove deprecated OneHotEncoder...

2018-11-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/23100#discussion_r236410750
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala ---
@@ -17,126 +17,512 @@
 
 package org.apache.spark.ml.feature
 
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.SparkException
 import org.apache.spark.annotation.Since
-import org.apache.spark.ml.Transformer
+import org.apache.spark.ml.{Estimator, Model}
 import org.apache.spark.ml.attribute._
 import org.apache.spark.ml.linalg.Vectors
 import org.apache.spark.ml.param._
-import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol}
+import org.apache.spark.ml.param.shared.{HasHandleInvalid, HasInputCols, 
HasOutputCols}
 import org.apache.spark.ml.util._
 import org.apache.spark.sql.{DataFrame, Dataset}
-import org.apache.spark.sql.functions.{col, udf}
-import org.apache.spark.sql.types.{DoubleType, NumericType, StructType}
+import org.apache.spark.sql.expressions.UserDefinedFunction
+import org.apache.spark.sql.functions.{col, lit, udf}
+import org.apache.spark.sql.types.{DoubleType, StructField, StructType}
+
+/** Private trait for params and common methods for OneHotEncoder and 
OneHotEncoderModel */
+private[ml] trait OneHotEncoderBase extends Params with HasHandleInvalid
+with HasInputCols with HasOutputCols {
+
+  /**
+   * Param for how to handle invalid data during transform().
+   * Options are 'keep' (invalid data presented as an extra categorical 
feature) or
+   * 'error' (throw an error).
+   * Note that this Param is only used during transform; during fitting, 
invalid data
+   * will result in an error.
+   * Default: "error"
+   * @group param
+   */
+  @Since("2.3.0")
--- End diff --

As we discussed previously, it's a new class. Should we make it as `3.0`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23148: [SPARK-26177] Automated formatting for Scala code

2018-11-26 Thread koeninger
GitHub user koeninger opened a pull request:

https://github.com/apache/spark/pull/23148

[SPARK-26177] Automated formatting for Scala code

## What changes were proposed in this pull request?

Add a maven plugin and wrapper script at ./dev/scalafmt to use scalafmt to 
format files that differ from git master.

Intention is for contributors to be able to use this to automate fixing 
code style, not to include it in build pipeline yet.

If this PR is accepted, I'd make a different PR to update the code style 
section of https://spark.apache.org/contributing.html to mention the script

## How was this patch tested?

Manually tested by modifying a few files and running the script, then 
checking that scalastyle still passed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/koeninger/spark-1 scalafmt

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23148.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23148


commit 5df854a340f5e9e2fe7ba60194b4891235420369
Author: cody koeninger 
Date:   2018-11-21T20:32:57Z

WIP on scalafmt config

commit 57684272b60ff1213c957541f1db3f9d7aba1543
Author: cody koeninger 
Date:   2018-11-21T21:22:15Z

mvn plugin, e.g. ./build/mvn mvn-scalafmt_2.12:format

commit bd7baeca577bf9b519fe028d1e831fb7193e7af9
Author: cody koeninger 
Date:   2018-11-22T17:07:06Z

shell wrapper for mvn scalafmt plugin




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23148: [SPARK-26177] Automated formatting for Scala code

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23148
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99288/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23148: [SPARK-26177] Automated formatting for Scala code

2018-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23148
  
**[Test build #99288 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99288/testReport)**
 for PR 23148 at commit 
[`bd7baec`](https://github.com/apache/spark/commit/bd7baeca577bf9b519fe028d1e831fb7193e7af9).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23148: [SPARK-26177] Automated formatting for Scala code

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23148
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23141: [SPARK-26021][SQL][followup] add test for special floati...

2018-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23141
  
**[Test build #99279 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99279/testReport)**
 for PR 23141 at commit 
[`7c590bc`](https://github.com/apache/spark/commit/7c590bcd239afb32537d607ee8b2190f4e4db5a3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23141: [SPARK-26021][SQL][followup] add test for special floati...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23141
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99279/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23141: [SPARK-26021][SQL][followup] add test for special floati...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23141
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-26 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236398604
  
--- Diff: 
core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala ---
@@ -95,10 +123,18 @@ class AppStatusStoreSuite extends SparkFunSuite {
 
   private def newTaskData(i: Int): TaskDataWrapper = {
 new TaskDataWrapper(
-  i, i, i, i, i, i, i.toString, i.toString, i.toString, i.toString, 
false, Nil, None,
+  i, i, i, i, i, i, i.toString, i.toString, "SUCCESS", i.toString, 
false, Nil, None,
   i, i, i, i, i, i, i, i, i, i,
   i, i, i, i, i, i, i, i, i, i,
   i, i, i, i, stageId, attemptId)
   }
 
+  private def failedTaskData(i: Int): TaskDataWrapper = {
--- End diff --

better to have the status be a parameter to the existing method (with a 
default value)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-26 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236405278
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
 val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
 
 def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
-  Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(index)
-  .first(0L)
-  .closeableIterator()
-  ) { it =>
-var last = Double.NaN
-var currentIdx = -1L
-indices.map { idx =>
-  if (idx == currentIdx) {
-last
-  } else {
-val diff = idx - currentIdx
-currentIdx = idx
-if (it.skip(diff - 1)) {
-  last = fn(it.next()).toDouble
-  last
-} else {
-  Double.NaN
-}
-  }
-}.toIndexedSeq
+  val quantileTasks = store.view(classOf[TaskDataWrapper])
+.parent(stageKey)
+.index(index)
+.first(0L)
+.asScala
+.filter(_.status == "SUCCESS") // Filter "SUCCESS" tasks
+.zipWithIndex
+.filter(x => indices.contains(x._2))
--- End diff --

I don't feel strongly about it but I have written it as above where a new 
block isn't needed. I don't think it makes any runtime difference.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.me...

2018-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23055
  
**[Test build #99278 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99278/testReport)**
 for PR 23055 at commit 
[`fd92a4e`](https://github.com/apache/spark/commit/fd92a4e1cee9f666d7ee6f9c9fcb45367c8132a8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-26 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236405732
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
 val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
 
 def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
-  Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(index)
-  .first(0L)
-  .closeableIterator()
-  ) { it =>
-var last = Double.NaN
-var currentIdx = -1L
-indices.map { idx =>
-  if (idx == currentIdx) {
-last
-  } else {
-val diff = idx - currentIdx
-currentIdx = idx
-if (it.skip(diff - 1)) {
-  last = fn(it.next()).toDouble
-  last
-} else {
-  Double.NaN
-}
-  }
-}.toIndexedSeq
+  val quantileTasks = store.view(classOf[TaskDataWrapper])
+.parent(stageKey)
+.index(index)
+.first(0L)
+.asScala
+.filter(_.status == "SUCCESS") // Filter "SUCCESS" tasks
+.zipWithIndex
+.filter(x => indices.contains(x._2))
--- End diff --

It doesn't make a runtime different. That's why this is a style nit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.me...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23055
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99278/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.me...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23055
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23088
  
**[Test build #99287 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99287/testReport)**
 for PR 23088 at commit 
[`b29a150`](https://github.com/apache/spark/commit/b29a150f6d28065f5fde2b4c40f66ea4683bac69).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23111: [SPARK-26148][PYTHON][TESTS] Increases default paralleli...

2018-11-26 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/23111
  
we might need to be careful that this doesn't un-intentionally overload the 
jenkins workers so that we end up hitting more timeouts from too many things 
running concurrently (I dunno how isolated the workers are)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-26 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236409746
  
--- Diff: 
core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala ---
@@ -95,10 +123,18 @@ class AppStatusStoreSuite extends SparkFunSuite {
 
   private def newTaskData(i: Int): TaskDataWrapper = {
 new TaskDataWrapper(
-  i, i, i, i, i, i, i.toString, i.toString, i.toString, i.toString, 
false, Nil, None,
+  i, i, i, i, i, i, i.toString, i.toString, "SUCCESS", i.toString, 
false, Nil, None,
   i, i, i, i, i, i, i, i, i, i,
   i, i, i, i, i, i, i, i, i, i,
   i, i, i, i, stageId, attemptId)
   }
 
+  private def failedTaskData(i: Int): TaskDataWrapper = {
--- End diff --

Thanks. done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23088
  
**[Test build #99289 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99289/testReport)**
 for PR 23088 at commit 
[`c81fcf3`](https://github.com/apache/spark/commit/c81fcf3a18339852a681f118129d4d4234adae5a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23088
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5372/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   >