[GitHub] spark issue #21981: [SAPRK-25011][ML]add prefix to __all__ in fpm.py

2018-08-02 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/21981
  
Ah, this triggers the doc check. Updating.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21965: [SPARK-23909][SQL] Add filter function.

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21965
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21965: [SPARK-23909][SQL] Add filter function.

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21965
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94088/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21965: [SPARK-23909][SQL] Add filter function.

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21965
  
**[Test build #94088 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94088/testReport)**
 for PR 21965 at commit 
[`ace19dd`](https://github.com/apache/spark/commit/ace19dd7230598350838aa60fc93b32a08642acd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ArrayFilter(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-02 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21952
  
we can keep investigating the perf regression, this patch itself LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21963: [SPARK-24997[SQL] Enable support of MINUS ALL

2018-08-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21963


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21952
  
**[Test build #94104 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94104/testReport)**
 for PR 21952 at commit 
[`ec17d58`](https://github.com/apache/spark/commit/ec17d58ea674ffba6e2c07284a26f6b3a1e7357e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21981: [SAPRK-25011][ML]add prefix to __all__ in fpm.py

2018-08-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21981
  
Looks we should fix the doc in PrefixSpan too. LGTM if the check pass.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21965: [SPARK-23909][SQL] Add filter function.

2018-08-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21965
  
cc @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21952
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21898
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21952
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1734/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21898
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94081/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21898
  
**[Test build #94081 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94081/testReport)**
 for PR 21898 at commit 
[`33d4827`](https://github.com/apache/spark/commit/33d4827b398134b7afa22328931720b0881224f4).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21964: [SPARK-24788][SQL] RelationalGroupedDataset.toStr...

2018-08-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21964


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21752: [SPARK-24788][SQL] fixed UnresolvedException when...

2018-08-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21752


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-08-02 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/21941
  
Thanks a lot @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....

2018-08-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21898#discussion_r207444701
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala ---
@@ -39,6 +44,17 @@ class BarrierTaskContext(
   extends TaskContextImpl(stageId, stageAttemptNumber, partitionId, 
taskAttemptId, attemptNumber,
   taskMemoryManager, localProperties, metricsSystem, taskMetrics) {
 
+  private val barrierCoordinator: RpcEndpointRef = {
+val env = SparkEnv.get
+RpcUtils.makeDriverRef("barrierSync", env.conf, env.rpcEnv)
--- End diff --

It would be nice to define `"barrierSync"` as a constant.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21586: [SPARK-24586][SQL] Upcast should not allow casting from ...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21586
  
**[Test build #94103 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94103/testReport)**
 for PR 21586 at commit 
[`c89d12e`](https://github.com/apache/spark/commit/c89d12e7b32987cbe4a081fc417fb38022061cc5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....

2018-08-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21898#discussion_r207442439
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.{Timer, TimerTask}
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
+import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
+
+/**
+ * A coordinator that handles all global sync requests from 
BarrierTaskContext. Each global sync
+ * request is generated by `BarrierTaskContext.barrier()`, and identified 
by
+ * stageId + stageAttemptId + barrierEpoch. Reply all the blocking global 
sync requests upon
+ * received all the requests for a group of `barrier()` calls. If the 
coordinator doesn't collect
+ * enough global sync requests within a configured time, fail all the 
requests due to timeout.
+ */
+private[spark] class BarrierCoordinator(
+timeout: Int,
+listenerBus: LiveListenerBus,
+override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint with 
Logging {
+
+  private val timer = new Timer("BarrierCoordinator barrier epoch 
increment timer")
+
+  private val listener = new SparkListener {
+override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted): Unit = {
+  val stageInfo = stageCompleted.stageInfo
+  // Remove internal data from a finished stage attempt.
+  cleanupSyncRequests(stageInfo.stageId, stageInfo.attemptNumber)
+  barrierEpochByStageIdAndAttempt.remove((stageInfo.stageId, 
stageInfo.attemptNumber))
+}
+  }
+
+  // Epoch counter for each barrier (stage, attempt).
+  private val barrierEpochByStageIdAndAttempt = new 
ConcurrentHashMap[(Int, Int), AtomicInteger]
+
+  // Remember all the blocking global sync requests for each barrier 
(stage, attempt).
+  private val syncRequestsByStageIdAndAttempt =
+new ConcurrentHashMap[(Int, Int), ArrayBuffer[RpcCallContext]]
+
+  override def onStart(): Unit = {
+super.onStart()
+listenerBus.addToStatusQueue(listener)
+  }
+
+  /**
+   * Get the array of [[RpcCallContext]]s that correspond to a barrier 
sync request from a stage
+   * attempt.
+   */
+  private def getOrInitSyncRequests(
+  stageId: Int,
+  stageAttemptId: Int,
+  numTasks: Int = 0): ArrayBuffer[RpcCallContext] = {
+val requests = syncRequestsByStageIdAndAttempt.putIfAbsent((stageId, 
stageAttemptId),
+  new ArrayBuffer[RpcCallContext](numTasks))
+if (requests == null) {
+  syncRequestsByStageIdAndAttempt.get((stageId, stageAttemptId))
+} else {
+  requests
+}
+  }
+
+  /**
+   * Clean up the array of [[RpcCallContext]]s that correspond to a 
barrier sync request from a
+   * stage attempt.
+   */
+  private def cleanupSyncRequests(stageId: Int, stageAttemptId: Int): Unit 
= {
+val requests = syncRequestsByStageIdAndAttempt.remove((stageId, 
stageAttemptId))
+if (requests != null) {
+  requests.clear()
--- End diff --

Agree with @cloud-fan that this is not necessary. It only explicitly clears 
the ArrayBuffer object instead of the contexts.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....

2018-08-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21898#discussion_r207444103
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.{Timer, TimerTask}
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
+import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
+
+/**
+ * A coordinator that handles all global sync requests from 
BarrierTaskContext. Each global sync
+ * request is generated by `BarrierTaskContext.barrier()`, and identified 
by
+ * stageId + stageAttemptId + barrierEpoch. Reply all the blocking global 
sync requests upon
+ * received all the requests for a group of `barrier()` calls. If the 
coordinator doesn't collect
+ * enough global sync requests within a configured time, fail all the 
requests due to timeout.
+ */
+private[spark] class BarrierCoordinator(
+timeout: Int,
+listenerBus: LiveListenerBus,
+override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint with 
Logging {
+
+  private val timer = new Timer("BarrierCoordinator barrier epoch 
increment timer")
+
+  private val listener = new SparkListener {
+override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted): Unit = {
+  val stageInfo = stageCompleted.stageInfo
+  // Remove internal data from a finished stage attempt.
+  cleanupSyncRequests(stageInfo.stageId, stageInfo.attemptNumber)
+  barrierEpochByStageIdAndAttempt.remove((stageInfo.stageId, 
stageInfo.attemptNumber))
+  cancelTimerTask(stageInfo.stageId, stageInfo.attemptNumber)
+}
+  }
+
+  // Epoch counter for each barrier (stage, attempt).
+  private val barrierEpochByStageIdAndAttempt = new 
ConcurrentHashMap[(Int, Int), Int]
+
+  // Remember all the blocking global sync requests for each barrier 
(stage, attempt).
+  private val syncRequestsByStageIdAndAttempt =
+new ConcurrentHashMap[(Int, Int), ArrayBuffer[RpcCallContext]]
+
+  // Remember all the TimerTasks for each barrier (stage, attempt).
+  private val timerTaskByStageIdAndAttempt = new ConcurrentHashMap[(Int, 
Int), TimerTask]
+
+  // Number of tasks for each stage.
+  private val numTasksByStage = new ConcurrentHashMap[Int, Int]
+
+  override def onStart(): Unit = {
+super.onStart()
+listenerBus.addToStatusQueue(listener)
+  }
+
+  /**
+   * Get the array of [[RpcCallContext]]s that correspond to a barrier 
sync request from a stage
+   * attempt.
+   */
+  private def getOrInitSyncRequests(
+  stageId: Int,
+  stageAttemptId: Int,
+  numTasks: Int): ArrayBuffer[RpcCallContext] = {
+syncRequestsByStageIdAndAttempt.putIfAbsent((stageId, stageAttemptId),
+  new ArrayBuffer[RpcCallContext](numTasks))
+syncRequestsByStageIdAndAttempt.get((stageId, stageAttemptId))
+  }
+
+  /**
+   * Clean up the array of [[RpcCallContext]]s that correspond to a 
barrier sync request from a
+   * stage attempt.
+   */
+  private def cleanupSyncRequests(stageId: Int, stageAttemptId: Int): Unit 
= {
+val requests = syncRequestsByStageIdAndAttempt.remove((stageId, 
stageAttemptId))
+if (requests != null) {
+  requests.clear()
+}
+logInfo(s"Removed all the pending barrier sync requests from Stage 
$stageId (Attempt " +
+  s"$stageAttemptId).")
+  }
+
+  /**
+   * Get the barrier epoch that correspond to a barrier sync request from 
a stage attempt.
+   */
+  private def getOrInitBarrierEpoch(stageId: Int, stageAttemptId: Int): 

[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....

2018-08-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21898#discussion_r207443099
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.{Timer, TimerTask}
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
+import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
+
+/**
+ * A coordinator that handles all global sync requests from 
BarrierTaskContext. Each global sync
+ * request is generated by `BarrierTaskContext.barrier()`, and identified 
by
+ * stageId + stageAttemptId + barrierEpoch. Reply all the blocking global 
sync requests upon
+ * received all the requests for a group of `barrier()` calls. If the 
coordinator doesn't collect
+ * enough global sync requests within a configured time, fail all the 
requests due to timeout.
+ */
+private[spark] class BarrierCoordinator(
+timeout: Int,
+listenerBus: LiveListenerBus,
+override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint with 
Logging {
+
+  private val timer = new Timer("BarrierCoordinator barrier epoch 
increment timer")
+
+  private val listener = new SparkListener {
+override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted): Unit = {
+  val stageInfo = stageCompleted.stageInfo
+  // Remove internal data from a finished stage attempt.
+  cleanupSyncRequests(stageInfo.stageId, stageInfo.attemptNumber)
+  barrierEpochByStageIdAndAttempt.remove((stageInfo.stageId, 
stageInfo.attemptNumber))
+  cancelTimerTask(stageInfo.stageId, stageInfo.attemptNumber)
+}
+  }
+
+  // Epoch counter for each barrier (stage, attempt).
+  private val barrierEpochByStageIdAndAttempt = new 
ConcurrentHashMap[(Int, Int), Int]
+
+  // Remember all the blocking global sync requests for each barrier 
(stage, attempt).
+  private val syncRequestsByStageIdAndAttempt =
+new ConcurrentHashMap[(Int, Int), ArrayBuffer[RpcCallContext]]
+
+  // Remember all the TimerTasks for each barrier (stage, attempt).
+  private val timerTaskByStageIdAndAttempt = new ConcurrentHashMap[(Int, 
Int), TimerTask]
+
+  // Number of tasks for each stage.
+  private val numTasksByStage = new ConcurrentHashMap[Int, Int]
+
+  override def onStart(): Unit = {
+super.onStart()
+listenerBus.addToStatusQueue(listener)
+  }
+
+  /**
+   * Get the array of [[RpcCallContext]]s that correspond to a barrier 
sync request from a stage
+   * attempt.
+   */
+  private def getOrInitSyncRequests(
+  stageId: Int,
+  stageAttemptId: Int,
+  numTasks: Int): ArrayBuffer[RpcCallContext] = {
+syncRequestsByStageIdAndAttempt.putIfAbsent((stageId, stageAttemptId),
+  new ArrayBuffer[RpcCallContext](numTasks))
+syncRequestsByStageIdAndAttempt.get((stageId, stageAttemptId))
+  }
+
+  /**
+   * Clean up the array of [[RpcCallContext]]s that correspond to a 
barrier sync request from a
+   * stage attempt.
+   */
+  private def cleanupSyncRequests(stageId: Int, stageAttemptId: Int): Unit 
= {
+val requests = syncRequestsByStageIdAndAttempt.remove((stageId, 
stageAttemptId))
+if (requests != null) {
+  requests.clear()
+}
+logInfo(s"Removed all the pending barrier sync requests from Stage 
$stageId (Attempt " +
+  s"$stageAttemptId).")
+  }
+
+  /**
+   * Get the barrier epoch that correspond to a barrier sync request from 
a stage attempt.
+   */
+  private def getOrInitBarrierEpoch(stageId: Int, stageAttemptId: Int): 

[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....

2018-08-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21898#discussion_r207444041
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.{Timer, TimerTask}
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
+import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
+
+/**
+ * A coordinator that handles all global sync requests from 
BarrierTaskContext. Each global sync
+ * request is generated by `BarrierTaskContext.barrier()`, and identified 
by
+ * stageId + stageAttemptId + barrierEpoch. Reply all the blocking global 
sync requests upon
+ * received all the requests for a group of `barrier()` calls. If the 
coordinator doesn't collect
+ * enough global sync requests within a configured time, fail all the 
requests due to timeout.
+ */
+private[spark] class BarrierCoordinator(
+timeout: Int,
+listenerBus: LiveListenerBus,
+override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint with 
Logging {
+
+  private val timer = new Timer("BarrierCoordinator barrier epoch 
increment timer")
+
+  private val listener = new SparkListener {
+override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted): Unit = {
+  val stageInfo = stageCompleted.stageInfo
+  // Remove internal data from a finished stage attempt.
+  cleanupSyncRequests(stageInfo.stageId, stageInfo.attemptNumber)
+  barrierEpochByStageIdAndAttempt.remove((stageInfo.stageId, 
stageInfo.attemptNumber))
+  cancelTimerTask(stageInfo.stageId, stageInfo.attemptNumber)
+}
+  }
+
+  // Epoch counter for each barrier (stage, attempt).
+  private val barrierEpochByStageIdAndAttempt = new 
ConcurrentHashMap[(Int, Int), Int]
+
+  // Remember all the blocking global sync requests for each barrier 
(stage, attempt).
+  private val syncRequestsByStageIdAndAttempt =
+new ConcurrentHashMap[(Int, Int), ArrayBuffer[RpcCallContext]]
+
+  // Remember all the TimerTasks for each barrier (stage, attempt).
+  private val timerTaskByStageIdAndAttempt = new ConcurrentHashMap[(Int, 
Int), TimerTask]
+
+  // Number of tasks for each stage.
+  private val numTasksByStage = new ConcurrentHashMap[Int, Int]
+
+  override def onStart(): Unit = {
+super.onStart()
+listenerBus.addToStatusQueue(listener)
+  }
+
+  /**
+   * Get the array of [[RpcCallContext]]s that correspond to a barrier 
sync request from a stage
+   * attempt.
+   */
+  private def getOrInitSyncRequests(
+  stageId: Int,
+  stageAttemptId: Int,
+  numTasks: Int): ArrayBuffer[RpcCallContext] = {
+syncRequestsByStageIdAndAttempt.putIfAbsent((stageId, stageAttemptId),
+  new ArrayBuffer[RpcCallContext](numTasks))
+syncRequestsByStageIdAndAttempt.get((stageId, stageAttemptId))
+  }
+
+  /**
+   * Clean up the array of [[RpcCallContext]]s that correspond to a 
barrier sync request from a
+   * stage attempt.
+   */
+  private def cleanupSyncRequests(stageId: Int, stageAttemptId: Int): Unit 
= {
+val requests = syncRequestsByStageIdAndAttempt.remove((stageId, 
stageAttemptId))
+if (requests != null) {
+  requests.clear()
+}
+logInfo(s"Removed all the pending barrier sync requests from Stage 
$stageId (Attempt " +
+  s"$stageAttemptId).")
+  }
+
+  /**
+   * Get the barrier epoch that correspond to a barrier sync request from 
a stage attempt.
+   */
+  private def getOrInitBarrierEpoch(stageId: Int, stageAttemptId: Int): 

[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....

2018-08-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21898#discussion_r207442661
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.{Timer, TimerTask}
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
+import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
+
+/**
+ * A coordinator that handles all global sync requests from 
BarrierTaskContext. Each global sync
+ * request is generated by `BarrierTaskContext.barrier()`, and identified 
by
+ * stageId + stageAttemptId + barrierEpoch. Reply all the blocking global 
sync requests upon
+ * received all the requests for a group of `barrier()` calls. If the 
coordinator doesn't collect
+ * enough global sync requests within a configured time, fail all the 
requests due to timeout.
+ */
+private[spark] class BarrierCoordinator(
+timeout: Int,
+listenerBus: LiveListenerBus,
+override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint with 
Logging {
+
+  private val timer = new Timer("BarrierCoordinator barrier epoch 
increment timer")
+
+  private val listener = new SparkListener {
+override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted): Unit = {
+  val stageInfo = stageCompleted.stageInfo
+  // Remove internal data from a finished stage attempt.
+  cleanupSyncRequests(stageInfo.stageId, stageInfo.attemptNumber)
+  barrierEpochByStageIdAndAttempt.remove((stageInfo.stageId, 
stageInfo.attemptNumber))
+  cancelTimerTask(stageInfo.stageId, stageInfo.attemptNumber)
+}
+  }
+
+  // Epoch counter for each barrier (stage, attempt).
+  private val barrierEpochByStageIdAndAttempt = new 
ConcurrentHashMap[(Int, Int), Int]
+
+  // Remember all the blocking global sync requests for each barrier 
(stage, attempt).
+  private val syncRequestsByStageIdAndAttempt =
+new ConcurrentHashMap[(Int, Int), ArrayBuffer[RpcCallContext]]
+
+  // Remember all the TimerTasks for each barrier (stage, attempt).
+  private val timerTaskByStageIdAndAttempt = new ConcurrentHashMap[(Int, 
Int), TimerTask]
+
+  // Number of tasks for each stage.
+  private val numTasksByStage = new ConcurrentHashMap[Int, Int]
+
+  override def onStart(): Unit = {
+super.onStart()
+listenerBus.addToStatusQueue(listener)
+  }
+
+  /**
+   * Get the array of [[RpcCallContext]]s that correspond to a barrier 
sync request from a stage
+   * attempt.
+   */
+  private def getOrInitSyncRequests(
+  stageId: Int,
+  stageAttemptId: Int,
+  numTasks: Int): ArrayBuffer[RpcCallContext] = {
+syncRequestsByStageIdAndAttempt.putIfAbsent((stageId, stageAttemptId),
+  new ArrayBuffer[RpcCallContext](numTasks))
+syncRequestsByStageIdAndAttempt.get((stageId, stageAttemptId))
+  }
+
+  /**
+   * Clean up the array of [[RpcCallContext]]s that correspond to a 
barrier sync request from a
+   * stage attempt.
+   */
+  private def cleanupSyncRequests(stageId: Int, stageAttemptId: Int): Unit 
= {
+val requests = syncRequestsByStageIdAndAttempt.remove((stageId, 
stageAttemptId))
+if (requests != null) {
+  requests.clear()
+}
+logInfo(s"Removed all the pending barrier sync requests from Stage 
$stageId (Attempt " +
+  s"$stageAttemptId).")
+  }
+
+  /**
+   * Get the barrier epoch that correspond to a barrier sync request from 
a stage attempt.
+   */
+  private def getOrInitBarrierEpoch(stageId: Int, stageAttemptId: Int): 

[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....

2018-08-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21898#discussion_r20759
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.{Timer, TimerTask}
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
+import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
+
+/**
+ * A coordinator that handles all global sync requests from 
BarrierTaskContext. Each global sync
+ * request is generated by `BarrierTaskContext.barrier()`, and identified 
by
+ * stageId + stageAttemptId + barrierEpoch. Reply all the blocking global 
sync requests upon
+ * received all the requests for a group of `barrier()` calls. If the 
coordinator doesn't collect
+ * enough global sync requests within a configured time, fail all the 
requests due to timeout.
+ */
+private[spark] class BarrierCoordinator(
+timeout: Int,
+listenerBus: LiveListenerBus,
+override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint with 
Logging {
+
+  private val timer = new Timer("BarrierCoordinator barrier epoch 
increment timer")
+
+  private val listener = new SparkListener {
+override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted): Unit = {
+  val stageInfo = stageCompleted.stageInfo
+  // Remove internal data from a finished stage attempt.
+  cleanupSyncRequests(stageInfo.stageId, stageInfo.attemptNumber)
+  barrierEpochByStageIdAndAttempt.remove((stageInfo.stageId, 
stageInfo.attemptNumber))
+  cancelTimerTask(stageInfo.stageId, stageInfo.attemptNumber)
+}
+  }
+
+  // Epoch counter for each barrier (stage, attempt).
+  private val barrierEpochByStageIdAndAttempt = new 
ConcurrentHashMap[(Int, Int), Int]
+
+  // Remember all the blocking global sync requests for each barrier 
(stage, attempt).
+  private val syncRequestsByStageIdAndAttempt =
+new ConcurrentHashMap[(Int, Int), ArrayBuffer[RpcCallContext]]
+
+  // Remember all the TimerTasks for each barrier (stage, attempt).
+  private val timerTaskByStageIdAndAttempt = new ConcurrentHashMap[(Int, 
Int), TimerTask]
+
+  // Number of tasks for each stage.
+  private val numTasksByStage = new ConcurrentHashMap[Int, Int]
+
+  override def onStart(): Unit = {
+super.onStart()
+listenerBus.addToStatusQueue(listener)
+  }
+
+  /**
+   * Get the array of [[RpcCallContext]]s that correspond to a barrier 
sync request from a stage
+   * attempt.
+   */
+  private def getOrInitSyncRequests(
+  stageId: Int,
+  stageAttemptId: Int,
+  numTasks: Int): ArrayBuffer[RpcCallContext] = {
+syncRequestsByStageIdAndAttempt.putIfAbsent((stageId, stageAttemptId),
+  new ArrayBuffer[RpcCallContext](numTasks))
+syncRequestsByStageIdAndAttempt.get((stageId, stageAttemptId))
+  }
+
+  /**
+   * Clean up the array of [[RpcCallContext]]s that correspond to a 
barrier sync request from a
+   * stage attempt.
+   */
+  private def cleanupSyncRequests(stageId: Int, stageAttemptId: Int): Unit 
= {
+val requests = syncRequestsByStageIdAndAttempt.remove((stageId, 
stageAttemptId))
+if (requests != null) {
+  requests.clear()
+}
+logInfo(s"Removed all the pending barrier sync requests from Stage 
$stageId (Attempt " +
+  s"$stageAttemptId).")
+  }
+
+  /**
+   * Get the barrier epoch that correspond to a barrier sync request from 
a stage attempt.
+   */
+  private def getOrInitBarrierEpoch(stageId: Int, stageAttemptId: Int): 

[GitHub] spark pull request #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-02 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21952#discussion_r207444698
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -151,11 +155,12 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   case (f1, f2) => newConverter(f1.dataType, 
resolveNullableType(f2.schema(), f1.nullable))
 }
 val numFields = catalystStruct.length
+val containsNull = catalystStruct.exists(_.nullable)
--- End diff --

Let's remove it. We can fix the issue that Spark always turn schema to 
nullable later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21974: [SPARK-25002][SQL] Avro: revise the output record...

2018-08-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21974


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-08-02 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/21860
  
cc @kiszk,Can you review it again if you have some time?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...

2018-08-02 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21895
  
Ping @mridulm , would you please also take a review, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-02 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21952#discussion_r207444381
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -151,11 +155,12 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   case (f1, f2) => newConverter(f1.dataType, 
resolveNullableType(f2.schema(), f1.nullable))
 }
 val numFields = catalystStruct.length
+val containsNull = catalystStruct.exists(_.nullable)
--- End diff --

Was addressing the feedback from @gengliangwang We can remove it since the 
cases when all the fields are not nullable will be probably fairly rare. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21974: [SPARK-25002][SQL] Avro: revise the output record namesp...

2018-08-02 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21974
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-02 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21952#discussion_r207444037
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -100,17 +100,20 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   et, resolveNullableType(avroType.getElementType, containsNull))
 (getter, ordinal) => {
   val arrayData = getter.getArray(ordinal)
-  val result = new java.util.ArrayList[Any]
--- End diff --

My previous experience in ml project told me that `ArrayList` has slower 
setter performance due to one extra function call, so my preference is using 
array as much as possible, and wrap it into the right container in the end.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21981: [SAPRK-25011][ML]add prefix to __all__ in fpm.py

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21981
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1733/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21981: [SAPRK-25011][ML]add prefix to __all__ in fpm.py

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21981
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21981: [SAPRK-25011][ML]add prefix to __all__ in fpm.py

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21981
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21981: [SAPRK-25011][ML]add prefix to __all__ in fpm.py

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21981
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94102/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21981: [SAPRK-25011][ML]add prefix to __all__ in fpm.py

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21981
  
**[Test build #94102 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94102/testReport)**
 for PR 21981 at commit 
[`6af3644`](https://github.com/apache/spark/commit/6af364429263f1d9d427c23e7a84b4d57d30261d).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-02 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21952#discussion_r207443421
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -151,11 +155,12 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   case (f1, f2) => newConverter(f1.dataType, 
resolveNullableType(f2.schema(), f1.nullable))
 }
 val numFields = catalystStruct.length
+val containsNull = catalystStruct.exists(_.nullable)
--- End diff --

this only works when all the fields are not nullable, I don't think it's 
very useful.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21953: [SPARK-24992][Core] spark should randomize yarn local di...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21953
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21953: [SPARK-24992][Core] spark should randomize yarn local di...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21953
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94085/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21953: [SPARK-24992][Core] spark should randomize yarn local di...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21953
  
**[Test build #94085 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94085/testReport)**
 for PR 21953 at commit 
[`a8c1654`](https://github.com/apache/spark/commit/a8c165475edc4dc8b4839aea3035ede380d316dc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-02 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21952#discussion_r207443196
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -100,17 +100,20 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   et, resolveNullableType(avroType.getElementType, containsNull))
 (getter, ordinal) => {
   val arrayData = getter.getArray(ordinal)
-  val result = new java.util.ArrayList[Any]
--- End diff --

can we just `new java.util.ArrayList[Any](len)` here instead of creating an 
array and wrap it with array list?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21981: [SAPRK-25011][ML]add prefix to __all__ in fpm.py

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21981
  
**[Test build #94102 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94102/testReport)**
 for PR 21981 at commit 
[`6af3644`](https://github.com/apache/spark/commit/6af364429263f1d9d427c23e7a84b4d57d30261d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21981: [SAPRK-25011][ML]add prefix to __all__ in fpm.py

2018-08-02 Thread hhbyyh
GitHub user hhbyyh opened a pull request:

https://github.com/apache/spark/pull/21981

[SAPRK-25011][ML]add prefix to __all__ in fpm.py

## What changes were proposed in this pull request?

jira: https://issues.apache.org/jira/browse/SPARK-25011

add prefix to __all__ in fpm.py

## How was this patch tested?

existing unit test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hhbyyh/spark prefixall

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21981.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21981


commit 6af364429263f1d9d427c23e7a84b4d57d30261d
Author: Yuhao Yang 
Date:   2018-08-03T05:04:03Z

add prefix to all




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21980: [SPARK-25010][SQL] Rand/Randn should produce diff...

2018-08-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21980#discussion_r207442765
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala
 ---
@@ -75,14 +75,15 @@ class IncrementalExecution(
* with the desired literal
*/
   override lazy val optimizedPlan: LogicalPlan = {
-val random = new Random()
-
 sparkSession.sessionState.optimizer.execute(withCachedData) 
transformAllExpressions {
   case ts @ CurrentBatchTimestamp(timestamp, _, _) =>
 logInfo(s"Current batch timestamp = $timestamp")
 ts.toLiteral
   // SPARK-24896: Set the seed for random number generation in Uuid 
expressions.
-  case _: Uuid => Uuid(Some(random.nextLong()))
+  case _: Uuid => Uuid(Some(Utils.random.nextLong()))
--- End diff --

Sounds good. Let me update it accordingly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...

2018-08-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21941


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21980: [SPARK-25010][SQL] Rand/Randn should produce diff...

2018-08-02 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21980#discussion_r207441687
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala
 ---
@@ -75,14 +75,15 @@ class IncrementalExecution(
* with the desired literal
*/
   override lazy val optimizedPlan: LogicalPlan = {
-val random = new Random()
-
 sparkSession.sessionState.optimizer.execute(withCachedData) 
transformAllExpressions {
   case ts @ CurrentBatchTimestamp(timestamp, _, _) =>
 logInfo(s"Current batch timestamp = $timestamp")
 ts.toLiteral
   // SPARK-24896: Set the seed for random number generation in Uuid 
expressions.
-  case _: Uuid => Uuid(Some(random.nextLong()))
+  case _: Uuid => Uuid(Some(Utils.random.nextLong()))
--- End diff --

shall we create a trait `ExpressionWithRandomSeed` which has a 
`withNewSeed` method for these expressions? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-08-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21941
  
LGTM 

Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21966
  
**[Test build #94099 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94099/testReport)**
 for PR 21966 at commit 
[`16b9949`](https://github.com/apache/spark/commit/16b9949285c7133b89b3e6624cd8f5684abd3df5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21933: [SPARK-24917][CORE] make chunk size configurable

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21933
  
**[Test build #94100 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94100/testReport)**
 for PR 21933 at commit 
[`0251bd5`](https://github.com/apache/spark/commit/0251bd517e7fd3e695cb8366ffa03de8c9e2900b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94101 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94101/testReport)**
 for PR 21889 at commit 
[`37e0a97`](https://github.com/apache/spark/commit/37e0a97c32f28006e9af1143549cbdae5319df49).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21966
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1732/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21966
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-02 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
> Anybody else able to reproduce this failure? It succeeded on my developer 
machine.

It worked for me, too. Let's see what a retest does.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21977
  
cc @BryanCutler and @icexelloss too since we recently discussed about 
memory issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21933: [SPARK-24917][CORE] make chunk size configurable

2018-08-02 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21933
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21933: [SPARK-24917][CORE] make chunk size configurable

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21933
  
**[Test build #94098 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94098/testReport)**
 for PR 21933 at commit 
[`0251bd5`](https://github.com/apache/spark/commit/0251bd517e7fd3e695cb8366ffa03de8c9e2900b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21933: [SPARK-24917][CORE] make chunk size configurable

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21933
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21933: [SPARK-24917][CORE] make chunk size configurable

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21933
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94080/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21933: [SPARK-24917][CORE] make chunk size configurable

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21933
  
**[Test build #94080 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94080/testReport)**
 for PR 21933 at commit 
[`0251bd5`](https://github.com/apache/spark/commit/0251bd517e7fd3e695cb8366ffa03de8c9e2900b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21721
  
**[Test build #94097 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94097/testReport)**
 for PR 21721 at commit 
[`1775c2a`](https://github.com/apache/spark/commit/1775c2a1db2bf790ddf1cad0113c7ead2409ba65).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21975: [WIP][SPARK-25001][BUILD] Fix miscellaneous build warnin...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21975
  
**[Test build #94096 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94096/testReport)**
 for PR 21975 at commit 
[`2354e10`](https://github.com/apache/spark/commit/2354e10bd82f4770fc58feb5ac2738dd0dd39070).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21975: [WIP][SPARK-25001][BUILD] Fix miscellaneous build warnin...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21975
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21975: [WIP][SPARK-25001][BUILD] Fix miscellaneous build warnin...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21975
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1731/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21721
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21721
  
Yup, looks the resent Kafka upgrade has an issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21975: [WIP][SPARK-25001][BUILD] Fix miscellaneous build warnin...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21975
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21975: [WIP][SPARK-25001][BUILD] Fix miscellaneous build warnin...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21975
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94095/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21975: [WIP][SPARK-25001][BUILD] Fix miscellaneous build warnin...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21975
  
**[Test build #94095 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94095/testReport)**
 for PR 21975 at commit 
[`9c5ba2b`](https://github.com/apache/spark/commit/9c5ba2bf89b220b66be8818f30cece9f095bcc3b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21975: [WIP][SPARK-25001][BUILD] Fix miscellaneous build warnin...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21975
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94091/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21975: [WIP][SPARK-25001][BUILD] Fix miscellaneous build warnin...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21975
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1730/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21975: [WIP][SPARK-25001][BUILD] Fix miscellaneous build warnin...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21975
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21975: [WIP][SPARK-25001][BUILD] Fix miscellaneous build warnin...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21975
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21957: [SPARK-24994][SQL] When the data type of the field is co...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21957
  
**[Test build #94094 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94094/testReport)**
 for PR 21957 at commit 
[`24c061f`](https://github.com/apache/spark/commit/24c061fbf2e5c894729443171e16cbadfc004db3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21957: [SPARK-24994][SQL] When the data type of the field is co...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21957
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1729/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21957: [SPARK-24994][SQL] When the data type of the field is co...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21957
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21957: [SPARK-24994][SQL] When the data type of the field is co...

2018-08-02 Thread 10110346
Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/21957
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21917: [SPARK-24720][STREAMING-KAFKA] add option to alig...

2018-08-02 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/21917#discussion_r207437645
  
--- Diff: 
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
 ---
@@ -223,17 +240,46 @@ private[spark] class DirectKafkaInputDStream[K, V](
 }.getOrElse(offsets)
   }
 
-  override def compute(validTime: Time): Option[KafkaRDD[K, V]] = {
-val untilOffsets = clamp(latestOffsets())
-val offsetRanges = untilOffsets.map { case (tp, uo) =>
-  val fo = currentOffsets(tp)
-  OffsetRange(tp.topic, tp.partition, fo, uo)
+  /**
+   * Return the offset range. For non consecutive offset the last offset 
must have record.
+   * If offsets have missing data (transaction marker or abort), increases 
the
+   * range until we get the requested number of record or no more records.
+   * Because we have to iterate over all the records in this case,
+   * we also return the total number of records.
+   * @param offsets the target range we would like if offset were continue
+   * @return (totalNumberOfRecords, updated offset)
+   */
+  private def alignRanges(offsets: Map[TopicPartition, Long]): 
Iterable[OffsetRange] = {
+if (nonConsecutive) {
+  val localRw = rewinder()
+  val localOffsets = currentOffsets
+  context.sparkContext.parallelize(offsets.toList).mapPartitions(tpos 
=> {
+tpos.map { case (tp, o) =>
+  val offsetAndCount = 
localRw.getLastOffsetAndCount(localOffsets(tp), tp, o)
+  (tp, offsetAndCount)
+}
+  }).collect()
--- End diff --

What exactly is the benefit gained by doing a duplicate read of all the 
messages?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20345
  
**[Test build #94093 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94093/testReport)**
 for PR 20345 at commit 
[`39462fb`](https://github.com/apache/spark/commit/39462fbee952ec574b4c04d7718fd73bb5f56d9d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1728/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21668: [SPARK-24690][SQL] Add a new config to control plan stat...

2018-08-02 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21668
  
@cloud-fan ping


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-08-02 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20345
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21975: [WIP][SPARK-25001][BUILD] Fix miscellaneous build warnin...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21975
  
**[Test build #94091 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94091/testReport)**
 for PR 21975 at commit 
[`f9cc06c`](https://github.com/apache/spark/commit/f9cc06c6a7364b1ef1f03bbdf7bd3d9fef07bdac).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21898
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1727/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21898
  
**[Test build #94092 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94092/testReport)**
 for PR 21898 at commit 
[`67dcf17`](https://github.com/apache/spark/commit/67dcf17a47333c030a877a4fade463747c7bcf38).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21975: [WIP][SPARK-25001][BUILD] Fix miscellaneous build warnin...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21975
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21898
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21975: [WIP][SPARK-25001][BUILD] Fix miscellaneous build warnin...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21975
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1726/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21915
  
**[Test build #94090 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94090/testReport)**
 for PR 21915 at commit 
[`0796f76`](https://github.com/apache/spark/commit/0796f760c60da9bb8b5cadeee2e751dd898cf8cf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21915
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1725/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21915
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21915
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21915
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94089/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21915
  
**[Test build #94089 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94089/testReport)**
 for PR 21915 at commit 
[`f3ea9c6`](https://github.com/apache/spark/commit/f3ea9c6625da255119e360a499e439128f989e1e).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21915
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21915
  
**[Test build #94089 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94089/testReport)**
 for PR 21915 at commit 
[`f3ea9c6`](https://github.com/apache/spark/commit/f3ea9c6625da255119e360a499e439128f989e1e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21915
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1724/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21980
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94078/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >