[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-14 Thread GitBox


SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727529486


   **[Test build #131106 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131106/testReport)**
 for PR 30341 at commit 
[`77168fe`](https://github.com/apache/spark/commit/77168fe2dd687113ae1b9a2f086982861445ecda).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-14 Thread GitBox


AmplabJenkins removed a comment on pull request #30368:
URL: https://github.com/apache/spark/pull/30368#issuecomment-727527653


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131104/
   Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-14 Thread GitBox


AmplabJenkins removed a comment on pull request #30368:
URL: https://github.com/apache/spark/pull/30368#issuecomment-727527648


   Merged build finished. Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-14 Thread GitBox


AmplabJenkins commented on pull request #30368:
URL: https://github.com/apache/spark/pull/30368#issuecomment-727527648







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-14 Thread GitBox


SparkQA removed a comment on pull request #30368:
URL: https://github.com/apache/spark/pull/30368#issuecomment-727323154


   **[Test build #131104 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131104/testReport)**
 for PR 30368 at commit 
[`ff1ab01`](https://github.com/apache/spark/commit/ff1ab01888f13f829c78e6d4da53b94c2adc6d16).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-14 Thread GitBox


SparkQA commented on pull request #30368:
URL: https://github.com/apache/spark/pull/30368#issuecomment-727527476


   **[Test build #131104 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131104/testReport)**
 for PR 30368 at commit 
[`ff1ab01`](https://github.com/apache/spark/commit/ff1ab01888f13f829c78e6d4da53b94c2adc6d16).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhouyejoe commented on a change in pull request #30163: [SPARK-32918][SHUFFLE] RPC implementation to support control plane coordination for push-based shuffle

2020-11-14 Thread GitBox


zhouyejoe commented on a change in pull request #30163:
URL: https://github.com/apache/spark/pull/30163#discussion_r523716272



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java
##
@@ -181,7 +181,8 @@ public void onFailure(Throwable e) {
 }
   });
 } catch (Exception e) {
-  logger.error("Exception while sending finalizeShuffleMerge request", e);
+  logger.error(String.format("Exception while sending finalizeShuffleMerge 
request to %s:%s",

Review comment:
   Updated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Victsm commented on a change in pull request #30163: [SPARK-32918][SHUFFLE] RPC implementation to support control plane coordination for push-based shuffle

2020-11-14 Thread GitBox


Victsm commented on a change in pull request #30163:
URL: https://github.com/apache/spark/pull/30163#discussion_r523715366



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java
##
@@ -181,7 +181,8 @@ public void onFailure(Throwable e) {
 }
   });
 } catch (Exception e) {
-  logger.error("Exception while sending finalizeShuffleMerge request", e);
+  logger.error(String.format("Exception while sending finalizeShuffleMerge 
request to %s:%s",

Review comment:
   Nit: use {} instead to be more concise.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Victsm commented on a change in pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external s

2020-11-14 Thread GitBox


Victsm commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r523714996



##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##
@@ -657,6 +688,38 @@ class BlockManagerMasterEndpoint(
 }
   }
 
+  private def getShufflePushMergerLocations(
+  numMergersNeeded: Int,
+  hostsToFilter: Set[String]): Seq[BlockManagerId] = {
+val blockManagersWithExecutors = 
blockManagerIdByExecutor.groupBy(_._2.host)
+  .mapValues(_.head).values.map(_._2).toSet
+val filteredBlockManagersWithExecutors = blockManagersWithExecutors
+  .filterNot(x => hostsToFilter.contains(x.host))
+val filteredMergersWithExecutors = filteredBlockManagersWithExecutors.map(
+  x => BlockManagerId(x.executorId, x.host, 
StorageUtils.externalShuffleServicePort(conf)))
+
+// Enough mergers are available as part of active executors list
+if (filteredMergersWithExecutors.size >= numMergersNeeded) {
+  filteredMergersWithExecutors.toSeq
+} else {
+  // Delta mergers added from inactive mergers list to the active mergers 
list
+  val filteredMergersWithExecutorsHosts = 
filteredMergersWithExecutors.map(_.host)
+  // Pick random hosts instead of preferring the top of the list
+  val randomizedShuffleMergerLocations = 
Utils.randomize(shuffleMergerLocations.values.toSeq)

Review comment:
   @mridulm shouldn't it be the following instead?
   ```
   val filteredMergersWithoutExecutors = shuffleMergerLocations.values.toSeq
 .filterNot()
 .filterNot()
   
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Victsm commented on a change in pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external s

2020-11-14 Thread GitBox


Victsm commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r523713075



##
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##
@@ -1252,6 +1254,28 @@ private[spark] class DAGScheduler(
 execCores.map(cores => 
properties.setProperty(EXECUTOR_CORES_LOCAL_PROPERTY, cores))
   }
 
+  /**
+   * If push based shuffle is enabled, set the shuffle services to be used for 
the given
+   * shuffle map stage. The list of shuffle services is determined based on 
the list of
+   * active executors tracked by block manager master at the start of the 
stage.
+   */
+  private def prepareShuffleServicesForShuffleMapStage(stage: ShuffleMapStage) 
{
+// TODO: Handle stage reuse/retry cases separately as without finalize 
changes we cannot
+// TODO: disable shuffle merge for the retry/reuse cases
+val mergerLocs = sc.schedulerBackend.getMergerLocations(
+  stage.shuffleDep.partitioner.numPartitions, stage.resourceProfileId)
+logDebug(s"${stage.shuffleDep.getMergerLocs.map(_.host).mkString(", ")}")
+
+if (mergerLocs.nonEmpty) {
+  stage.shuffleDep.setMergerLocs(mergerLocs)
+  logInfo("Shuffle merge enabled for %s (%s) with %d merger locations"
+.format(stage, stage.name, stage.shuffleDep.getMergerLocs.size))
+} else {
+  stage.shuffleDep.setShuffleMergeEnabled(false)
+  logInfo("Shuffle merge disabled for %s (%s)".format(stage, stage.name))
+}

Review comment:
   Should we expose the list of merger locations or just a count of number 
of merges for a given ShuffleMapStage?
   Having a long list of merge locations might increase the event size.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Victsm commented on a change in pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external s

2020-11-14 Thread GitBox


Victsm commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r523712631



##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##
@@ -360,6 +371,17 @@ class BlockManagerMasterEndpoint(
 
   }
 
+  private def addMergerLocation(blockManagerId: BlockManagerId): Unit = {
+if (!shuffleMergerLocations.contains(blockManagerId.host) && 
!blockManagerId.isDriver) {
+  val shuffleServerId = BlockManagerId(blockManagerId.executorId, 
blockManagerId.host,
+StorageUtils.externalShuffleServicePort(conf))
+  if (shuffleMergerLocations.size >= maxRetainedMergerLocations) {
+shuffleMergerLocations -= shuffleMergerLocations.head._1

Review comment:
   Just to clarify, removing a merger does not remove the merged shuffle 
data on that location. It only prevents using that location for future 
shuffles. Reducers will still be able to fetch merged shuffle data from the 
removed merger locations.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhouyejoe commented on a change in pull request #30163: [SPARK-32918][SHUFFLE] RPC implementation to support control plane coordination for push-based shuffle

2020-11-14 Thread GitBox


zhouyejoe commented on a change in pull request #30163:
URL: https://github.com/apache/spark/pull/30163#discussion_r522744016



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java
##
@@ -158,6 +158,42 @@ public void pushBlocks(
 }
   }
 
+  /**
+   * Invoked by Spark driver to notify external shuffle services to finalize 
the shuffle merge
+   * for a given shuffle. This allows the driver to start the shuffle reducer 
stage after properly
+   * finishing the shuffle merge process associated with the shuffle mapper 
stage.
+   *
+   * @param host host of shuffle server
+   * @param port port of shuffle server.
+   * @param shuffleId shuffle ID of the shuffle to be finalized
+   * @param listener the listener to receive MergeStatuses
+   */
+  public void finalizeShuffleMerge(

Review comment:
   Tried it out and it worked with our WIP branch. Updated the PR. Please 
review. Thanks.

##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/MergeFinalizerListener.java
##
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.util.EventListener;
+
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+
+public interface MergeFinalizerListener extends EventListener {

Review comment:
   Added.

##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/MergeFinalizerListener.java
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.util.EventListener;
+
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+
+/**
+ * :: DeveloperApi ::
+ *
+ * Listener providing a callback function to invoke when driver receives the 
response for the finalize
+ * shuffle merge request sent to remote shuffle service.
+ */

Review comment:
   Added

##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java
##
@@ -156,4 +156,22 @@ public void pushBlocks(
   BlockFetchingListener listener) {
 throw new UnsupportedOperationException();
   }
+
+  /**
+   * Invoked by Spark driver to notify external shuffle services to finalize 
the shuffle merge
+   * for a given shuffle. This allows the driver to start the shuffle reducer 
stage after properly
+   * finishing the shuffle merge process associated with the shuffle mapper 
stage.
+   *
+   * @param host host of shuffle server
+   * @param port port of shuffle server.
+   * @param shuffleId shuffle ID of the shuffle to be finalized
+   * @param listener the listener to receive MergeStatuses
+   */

Review comment:
   Added.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Victsm commented on a change in pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external s

2020-11-14 Thread GitBox


Victsm commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r523711623



##
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##
@@ -1252,6 +1254,32 @@ private[spark] class DAGScheduler(
 execCores.map(cores => 
properties.setProperty(EXECUTOR_CORES_LOCAL_PROPERTY, cores))
   }
 
+  /**
+   * If push based shuffle is enabled, set the shuffle services to be used for 
the given
+   * shuffle map stage for block push/merge.
+   *
+   * Even with dynamic resource allocation kicking in and significantly 
reducing the number
+   * of available active executors, we would still be able to get sufficient 
shuffle service
+   * locations for block push/merge by getting the historical locations of 
past executors.
+   */
+  private def prepareShuffleServicesForShuffleMapStage(stage: 
ShuffleMapStage): Unit = {
+// TODO SPARK-32920: Handle stage reuse/retry cases separately as without 
finalize
+// TODO changes we cannot disable shuffle merge for the retry/reuse cases
+val mergerLocs = sc.schedulerBackend.getShufflePushMergerLocations(
+  stage.shuffleDep.partitioner.numPartitions, stage.resourceProfileId)
+
+if (mergerLocs.nonEmpty) {
+  stage.shuffleDep.setMergerLocs(mergerLocs)
+  logInfo(s"Shuffle merge enabled for $stage (${stage.name}) with" +

Review comment:
   Nit: Change to "Push-based shuffle enabled for stage ..."





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Victsm commented on a change in pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external s

2020-11-14 Thread GitBox


Victsm commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r523711581



##
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##
@@ -1252,6 +1254,32 @@ private[spark] class DAGScheduler(
 execCores.map(cores => 
properties.setProperty(EXECUTOR_CORES_LOCAL_PROPERTY, cores))
   }
 
+  /**
+   * If push based shuffle is enabled, set the shuffle services to be used for 
the given
+   * shuffle map stage for block push/merge.
+   *
+   * Even with dynamic resource allocation kicking in and significantly 
reducing the number
+   * of available active executors, we would still be able to get sufficient 
shuffle service
+   * locations for block push/merge by getting the historical locations of 
past executors.
+   */
+  private def prepareShuffleServicesForShuffleMapStage(stage: 
ShuffleMapStage): Unit = {
+// TODO SPARK-32920: Handle stage reuse/retry cases separately as without 
finalize
+// TODO changes we cannot disable shuffle merge for the retry/reuse cases
+val mergerLocs = sc.schedulerBackend.getShufflePushMergerLocations(
+  stage.shuffleDep.partitioner.numPartitions, stage.resourceProfileId)
+
+if (mergerLocs.nonEmpty) {
+  stage.shuffleDep.setMergerLocs(mergerLocs)
+  logInfo(s"Shuffle merge enabled for $stage (${stage.name}) with" +
+s" ${stage.shuffleDep.getMergerLocs.size} merger locations")
+
+  logDebug("List of shuffle push merger locations " +
+s"${stage.shuffleDep.getMergerLocs.map(_.host).mkString(", ")}")
+} else {
+  logInfo(s"No available merger locations. Shuffle merge disabled for 
$stage (${stage.name})")

Review comment:
   Nit: Change to "Push-based shuffle disabled for stage..."





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Victsm commented on a change in pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external s

2020-11-14 Thread GitBox


Victsm commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r523711229



##
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##
@@ -1938,4 +1938,51 @@ package object config {
   .version("3.0.1")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val PUSH_BASED_SHUFFLE_ENABLED =
+ConfigBuilder("spark.shuffle.push.enabled")
+  .doc("Set to 'true' to enable push based shuffle on the client side and 
this works in " +
+"conjunction with the server side flag 
spark.shuffle.server.mergedShuffleFileManagerImpl " +
+"which needs to be set with the appropriate " +
+"org.apache.spark.network.shuffle.MergedShuffleFileManager 
implementation for push-based " +
+"shuffle to be enabled")
+  .version("3.1.0")
+  .booleanConf
+  .createWithDefault(false)
+
+  private[spark] val SHUFFLE_MERGERS_MAX_RETAINED_LOCATIONS =
+ConfigBuilder("spark.shuffle.push.retainedMergerLocations")

Review comment:
   Should the config name be 
"spark.shuffle.push.maxRetainedMergerLocations" to be more consistent?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Victsm commented on a change in pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external s

2020-11-14 Thread GitBox


Victsm commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r523710562



##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##
@@ -657,6 +681,13 @@ class BlockManagerMasterEndpoint(
 }
   }
 
+  private def getShufflePushMergerLocations(
+  numMergersNeeded: Int,
+  hostsToFilter: Set[String]): Seq[BlockManagerId] = {
+val mergers = shuffleMergerLocations.values.filterNot(x => 
hostsToFilter.contains(x.host)).toSeq
+mergers.take(numMergersNeeded)

Review comment:
   I thought the reason we didn't do that internally is because the merger 
locations is picked at the beginning of the map stage and is only relevant with 
executor placement of the following reduce stage.
   Since there is no guarantee on whether the active executors at the beginning 
of the map stage would still be active at the beginning of the reduce stage, we 
didn't do that.
   Is that reasoning not applicable any more?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Victsm commented on a change in pull request #30163: [SPARK-32918][SHUFFLE] RPC implementation to support control plane coordination for push-based shuffle

2020-11-14 Thread GitBox


Victsm commented on a change in pull request #30163:
URL: https://github.com/apache/spark/pull/30163#discussion_r523709402



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/MergeFinalizerListener.java
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.util.EventListener;
+
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+
+/**
+ * :: DeveloperApi ::
+ *
+ * Listener providing a callback function to invoke when driver receives the 
response for the finalize

Review comment:
   Please fix the line length issue reported by java linter.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


AmplabJenkins removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727515493







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


AmplabJenkins commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727515493







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


SparkQA removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727287912


   **[Test build #131102 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131102/testReport)**
 for PR 30379 at commit 
[`f9ef4ea`](https://github.com/apache/spark/commit/f9ef4eadb84afd0802f0d3284e6109528aca269d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727515354


   **[Test build #131102 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131102/testReport)**
 for PR 30379 at commit 
[`f9ef4ea`](https://github.com/apache/spark/commit/f9ef4eadb84afd0802f0d3284e6109528aca269d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


AmplabJenkins commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727515074







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


AmplabJenkins removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727515074







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727514912


   **[Test build #131101 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131101/testReport)**
 for PR 30379 at commit 
[`8295ba2`](https://github.com/apache/spark/commit/8295ba2bee50d6aaff108194b1afb13fbfddafad).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


AmplabJenkins removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727484748







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


SparkQA removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727284961







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


AmplabJenkins commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727484748







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727481924


   **[Test build #131100 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131100/testReport)**
 for PR 30379 at commit 
[`4f73ec4`](https://github.com/apache/spark/commit/4f73ec4759e649dfb48b0c69b479c0c680adb487).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30283:
URL: https://github.com/apache/spark/pull/30283#discussion_r523705398



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala
##
@@ -77,9 +87,18 @@ private[k8s] class LoggingPodStatusWatcherImpl(
 }
   }
 
+  override def reset(): Unit = {
+resourceTooOldReceived = false
+  }
+
   override def onClose(e: KubernetesClientException): Unit = {
 logDebug(s"Stopping watching application $appId with last-observed phase 
$phase")
-closeWatch()
+if (e != null && e.getCode==HTTP_GONE) {

Review comment:
   `e.getCode == HTTP_GONE`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30283:
URL: https://github.com/apache/spark/pull/30283#discussion_r523705352



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala
##
@@ -177,4 +190,34 @@ private[k8s] class LoggingPodStatusWatcherImpl(
   private def formatTime(time: String): String = {
 if (time != null ||  time != "") time else "N/A"
   }
+
+  override def watchOrStop(sId: String): Boolean = if (waitForCompletion) {
+logInfo(s"Waiting for application ${conf.appName} with submission ID $sId 
to finish...")

Review comment:
   Oh, it's a compilation error.
   ```scala
   [error] 
/home/jenkins/workspace/SparkPullRequestBuilder@2/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala:195:
 object appName is not a member of package conf
   [error] logInfo(s"Waiting for application ${conf.appName} with 
submission ID $sId to finish...")
   [error]  ^
   [error] one error found
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


AmplabJenkins removed a comment on pull request #30283:
URL: https://github.com/apache/spark/pull/30283#issuecomment-727467281


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35708/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


AmplabJenkins commented on pull request #30283:
URL: https://github.com/apache/spark/pull/30283#issuecomment-727467260







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


AmplabJenkins removed a comment on pull request #30283:
URL: https://github.com/apache/spark/pull/30283#issuecomment-727465923







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


SparkQA removed a comment on pull request #30283:
URL: https://github.com/apache/spark/pull/30283#issuecomment-727439357


   **[Test build #131105 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131105/testReport)**
 for PR 30283 at commit 
[`86f8ee8`](https://github.com/apache/spark/commit/86f8ee882aefe1216fa1617572915d4c824dda67).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


AmplabJenkins removed a comment on pull request #30283:
URL: https://github.com/apache/spark/pull/30283#issuecomment-727465896


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


AmplabJenkins commented on pull request #30283:
URL: https://github.com/apache/spark/pull/30283#issuecomment-727465896







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


SparkQA commented on pull request #30283:
URL: https://github.com/apache/spark/pull/30283#issuecomment-727465800


   **[Test build #131105 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131105/testReport)**
 for PR 30283 at commit 
[`86f8ee8`](https://github.com/apache/spark/commit/86f8ee882aefe1216fa1617572915d4c824dda67).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30283:
URL: https://github.com/apache/spark/pull/30283#discussion_r523704953



##
File path: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala
##
@@ -205,6 +207,6 @@ class ClientSuite extends SparkFunSuite with BeforeAndAfter 
{
   loggingPodStatusWatcher,
   KUBERNETES_RESOURCE_PREFIX)
 submissionClient.run()
-verify(loggingPodStatusWatcher).awaitCompletion()
+verify(loggingPodStatusWatcher).watchOrStop("default:driver")

Review comment:
   `default` -> `kubernetesConf.namespace()`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30283:
URL: https://github.com/apache/spark/pull/30283#discussion_r523704833



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala
##
@@ -177,4 +190,34 @@ private[k8s] class LoggingPodStatusWatcherImpl(
   private def formatTime(time: String): String = {
 if (time != null ||  time != "") time else "N/A"
   }
+
+  override def watchOrStop(sId: String): Boolean = if (waitForCompletion) {
+logInfo(s"Waiting for application ${conf.appName} with submission ID $sId 
to finish...")
+val interval = maybeLoggingInterval
+
+synchronized {
+  while (!podCompleted && !resourceTooOldReceived) {
+wait(interval.get)
+logInfo(s"Application status for $appId (phase: $phase)")
+  }
+}
+
+if(podCompleted) {
+  logInfo(
+pod.map { p => s"Container final 
statuses:\n\n${containersDescription(p)}" }
+  .getOrElse("No containers were found in the driver pod."))
+  logInfo(s"Application ${appId} with submission ID $sId finished")
+} else {
+  logInfo(s"Got HTTP Gone code, resource version changed in k8s api. 
Creating a new watcher.")
+}
+
+logInfo(s"Watcher has stopped, pod completed status: ${podCompleted}")
+
+podCompleted
+  } else {
+logInfo(s"Deployed Spark application ${appId} with submission ID $sId into 
Kubernetes")

Review comment:
   Instead of `${appId}`, the other branches seem to use `${conf.appName}`, 
don't they?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30283:
URL: https://github.com/apache/spark/pull/30283#discussion_r523704798



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala
##
@@ -177,4 +190,34 @@ private[k8s] class LoggingPodStatusWatcherImpl(
   private def formatTime(time: String): String = {
 if (time != null ||  time != "") time else "N/A"
   }
+
+  override def watchOrStop(sId: String): Boolean = if (waitForCompletion) {
+logInfo(s"Waiting for application ${conf.appName} with submission ID $sId 
to finish...")
+val interval = maybeLoggingInterval
+
+synchronized {
+  while (!podCompleted && !resourceTooOldReceived) {
+wait(interval.get)
+logInfo(s"Application status for $appId (phase: $phase)")
+  }
+}
+
+if(podCompleted) {
+  logInfo(
+pod.map { p => s"Container final 
statuses:\n\n${containersDescription(p)}" }
+  .getOrElse("No containers were found in the driver pod."))
+  logInfo(s"Application ${appId} with submission ID $sId finished")
+} else {
+  logInfo(s"Got HTTP Gone code, resource version changed in k8s api. 
Creating a new watcher.")
+}
+
+logInfo(s"Watcher has stopped, pod completed status: ${podCompleted}")
+
+podCompleted
+  } else {
+logInfo(s"Deployed Spark application ${appId} with submission ID $sId into 
Kubernetes")
+logInfo(s"It seems we end up here, because we never want to wait for 
completion...")

Review comment:
   Please remote this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30283:
URL: https://github.com/apache/spark/pull/30283#discussion_r523704723



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala
##
@@ -177,4 +190,34 @@ private[k8s] class LoggingPodStatusWatcherImpl(
   private def formatTime(time: String): String = {
 if (time != null ||  time != "") time else "N/A"
   }
+
+  override def watchOrStop(sId: String): Boolean = if (waitForCompletion) {
+logInfo(s"Waiting for application ${conf.appName} with submission ID $sId 
to finish...")
+val interval = maybeLoggingInterval
+
+synchronized {
+  while (!podCompleted && !resourceTooOldReceived) {
+wait(interval.get)
+logInfo(s"Application status for $appId (phase: $phase)")
+  }
+}
+
+if(podCompleted) {
+  logInfo(
+pod.map { p => s"Container final 
statuses:\n\n${containersDescription(p)}" }
+  .getOrElse("No containers were found in the driver pod."))
+  logInfo(s"Application ${appId} with submission ID $sId finished")
+} else {
+  logInfo(s"Got HTTP Gone code, resource version changed in k8s api. 
Creating a new watcher.")

Review comment:
   It seems that we don't have this in the other branches. Please remove 
this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30283:
URL: https://github.com/apache/spark/pull/30283#discussion_r523704750



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala
##
@@ -177,4 +190,34 @@ private[k8s] class LoggingPodStatusWatcherImpl(
   private def formatTime(time: String): String = {
 if (time != null ||  time != "") time else "N/A"
   }
+
+  override def watchOrStop(sId: String): Boolean = if (waitForCompletion) {
+logInfo(s"Waiting for application ${conf.appName} with submission ID $sId 
to finish...")
+val interval = maybeLoggingInterval
+
+synchronized {
+  while (!podCompleted && !resourceTooOldReceived) {
+wait(interval.get)
+logInfo(s"Application status for $appId (phase: $phase)")
+  }
+}
+
+if(podCompleted) {
+  logInfo(
+pod.map { p => s"Container final 
statuses:\n\n${containersDescription(p)}" }
+  .getOrElse("No containers were found in the driver pod."))
+  logInfo(s"Application ${appId} with submission ID $sId finished")
+} else {
+  logInfo(s"Got HTTP Gone code, resource version changed in k8s api. 
Creating a new watcher.")
+}
+
+logInfo(s"Watcher has stopped, pod completed status: ${podCompleted}")

Review comment:
   Please remove this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30283:
URL: https://github.com/apache/spark/pull/30283#discussion_r523704501



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala
##
@@ -77,9 +87,18 @@ private[k8s] class LoggingPodStatusWatcherImpl(
 }
   }
 
+  override def reset(): Unit = {

Review comment:
   Shall we move this to line 69 (after `private def phase`) like the other 
branches?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-14 Thread GitBox


AmplabJenkins removed a comment on pull request #30368:
URL: https://github.com/apache/spark/pull/30368#issuecomment-727439126


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35707/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30283:
URL: https://github.com/apache/spark/pull/30283#discussion_r523704394



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala
##
@@ -28,8 +29,10 @@ import org.apache.spark.SparkException
 import org.apache.spark.internal.Logging
 import org.apache.spark.util.ThreadUtils
 
+

Review comment:
   Let's remove this to minimize diff and to be consistent.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-14 Thread GitBox


AmplabJenkins commented on pull request #30368:
URL: https://github.com/apache/spark/pull/30368#issuecomment-727439086







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-14 Thread GitBox


AmplabJenkins removed a comment on pull request #30368:
URL: https://github.com/apache/spark/pull/30368#issuecomment-727439086


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30283:
URL: https://github.com/apache/spark/pull/30283#discussion_r523704337



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
##
@@ -133,29 +136,37 @@ private[spark] class Client(
   .endVolume()
 .endSpec()
   .build()
-Utils.tryWithResource(
-  kubernetesClient
-.pods()
-.withName(resolvedDriverPod.getMetadata.getName)
-.watch(watcher)) { _ =>
-  val createdDriverPod = kubernetesClient.pods().create(resolvedDriverPod)
-  try {
-val otherKubernetesResources =
-  resolvedDriverSpec.driverKubernetesResources ++ Seq(configMap)
-addDriverOwnerReference(createdDriverPod, otherKubernetesResources)
-kubernetesClient.resourceList(otherKubernetesResources: 
_*).createOrReplace()
-  } catch {
-case NonFatal(e) =>
-  kubernetesClient.pods().delete(createdDriverPod)
-  throw e
-  }
 
-  if (waitForAppCompletion) {
-logInfo(s"Waiting for application $appName to finish...")
-watcher.awaitCompletion()
-logInfo(s"Application $appName finished.")
-  } else {
-logInfo(s"Deployed Spark application $appName into Kubernetes.")
+val driverPodName = resolvedDriverPod.getMetadata.getName
+var watch: Watch = null
+val createdDriverPod = kubernetesClient.pods().create(resolvedDriverPod)
+try {
+  val otherKubernetesResources = 
resolvedDriverSpec.driverKubernetesResources ++ Seq(configMap)
+  addDriverOwnerReference(createdDriverPod, otherKubernetesResources)
+  kubernetesClient.resourceList(otherKubernetesResources: 
_*).createOrReplace()
+} catch {
+  case NonFatal(e) =>
+kubernetesClient.pods().delete(createdDriverPod)
+throw e
+}
+val sId = Seq(kubernetesConf.namespace(), driverPodName).mkString(":")
+breakable {
+  while (true) {
+val podWithName = kubernetesClient
+  .pods()
+  .withName(driverPodName)
+
+watcher.reset()
+
+watch = podWithName.watch(watcher)
+
+watcher.eventReceived(Action.MODIFIED, podWithName.get())
+
+if(watcher.watchOrStop(sId)) {
+  logInfo(s"Stop watching as the pod has completed.")

Review comment:
   Let's remove this. Otherwise this exists only at `branch-2.4`. People 
can be confused and think as a regression at branch-3.0 and `master`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


SparkQA commented on pull request #30283:
URL: https://github.com/apache/spark/pull/30283#issuecomment-727439357


   **[Test build #131105 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131105/testReport)**
 for PR 30283 at commit 
[`86f8ee8`](https://github.com/apache/spark/commit/86f8ee882aefe1216fa1617572915d4c824dda67).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-14 Thread GitBox


SparkQA commented on pull request #30368:
URL: https://github.com/apache/spark/pull/30368#issuecomment-727438978


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35707/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30283:
URL: https://github.com/apache/spark/pull/30283#discussion_r523704197



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
##
@@ -133,29 +136,37 @@ private[spark] class Client(
   .endVolume()
 .endSpec()
   .build()
-Utils.tryWithResource(
-  kubernetesClient
-.pods()
-.withName(resolvedDriverPod.getMetadata.getName)
-.watch(watcher)) { _ =>
-  val createdDriverPod = kubernetesClient.pods().create(resolvedDriverPod)
-  try {
-val otherKubernetesResources =
-  resolvedDriverSpec.driverKubernetesResources ++ Seq(configMap)
-addDriverOwnerReference(createdDriverPod, otherKubernetesResources)
-kubernetesClient.resourceList(otherKubernetesResources: 
_*).createOrReplace()
-  } catch {
-case NonFatal(e) =>
-  kubernetesClient.pods().delete(createdDriverPod)
-  throw e
-  }
 
-  if (waitForAppCompletion) {
-logInfo(s"Waiting for application $appName to finish...")
-watcher.awaitCompletion()
-logInfo(s"Application $appName finished.")
-  } else {
-logInfo(s"Deployed Spark application $appName into Kubernetes.")
+val driverPodName = resolvedDriverPod.getMetadata.getName
+var watch: Watch = null
+val createdDriverPod = kubernetesClient.pods().create(resolvedDriverPod)
+try {
+  val otherKubernetesResources = 
resolvedDriverSpec.driverKubernetesResources ++ Seq(configMap)
+  addDriverOwnerReference(createdDriverPod, otherKubernetesResources)
+  kubernetesClient.resourceList(otherKubernetesResources: 
_*).createOrReplace()
+} catch {
+  case NonFatal(e) =>
+kubernetesClient.pods().delete(createdDriverPod)
+throw e
+}
+val sId = Seq(kubernetesConf.namespace(), driverPodName).mkString(":")
+breakable {
+  while (true) {
+val podWithName = kubernetesClient
+  .pods()
+  .withName(driverPodName)
+
+watcher.reset()
+
+watch = podWithName.watch(watcher)
+
+watcher.eventReceived(Action.MODIFIED, podWithName.get())
+

Review comment:
   Please add the following comment like the other branches.
   ```
   // Break the while loop if the pod is completed or we don't want to wait
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30283:
URL: https://github.com/apache/spark/pull/30283#discussion_r523704122



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
##
@@ -133,29 +136,37 @@ private[spark] class Client(
   .endVolume()
 .endSpec()
   .build()
-Utils.tryWithResource(
-  kubernetesClient
-.pods()
-.withName(resolvedDriverPod.getMetadata.getName)
-.watch(watcher)) { _ =>
-  val createdDriverPod = kubernetesClient.pods().create(resolvedDriverPod)
-  try {
-val otherKubernetesResources =
-  resolvedDriverSpec.driverKubernetesResources ++ Seq(configMap)
-addDriverOwnerReference(createdDriverPod, otherKubernetesResources)
-kubernetesClient.resourceList(otherKubernetesResources: 
_*).createOrReplace()
-  } catch {
-case NonFatal(e) =>
-  kubernetesClient.pods().delete(createdDriverPod)
-  throw e
-  }
 
-  if (waitForAppCompletion) {
-logInfo(s"Waiting for application $appName to finish...")
-watcher.awaitCompletion()
-logInfo(s"Application $appName finished.")
-  } else {
-logInfo(s"Deployed Spark application $appName into Kubernetes.")
+val driverPodName = resolvedDriverPod.getMetadata.getName
+var watch: Watch = null
+val createdDriverPod = kubernetesClient.pods().create(resolvedDriverPod)
+try {
+  val otherKubernetesResources = 
resolvedDriverSpec.driverKubernetesResources ++ Seq(configMap)
+  addDriverOwnerReference(createdDriverPod, otherKubernetesResources)
+  kubernetesClient.resourceList(otherKubernetesResources: 
_*).createOrReplace()
+} catch {
+  case NonFatal(e) =>
+kubernetesClient.pods().delete(createdDriverPod)
+throw e
+}
+val sId = Seq(kubernetesConf.namespace(), driverPodName).mkString(":")
+breakable {
+  while (true) {
+val podWithName = kubernetesClient
+  .pods()
+  .withName(driverPodName)
+
+watcher.reset()
+

Review comment:
   Let's remove this empty line like the other branch.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30283:
URL: https://github.com/apache/spark/pull/30283#discussion_r523704157



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
##
@@ -133,29 +136,37 @@ private[spark] class Client(
   .endVolume()
 .endSpec()
   .build()
-Utils.tryWithResource(
-  kubernetesClient
-.pods()
-.withName(resolvedDriverPod.getMetadata.getName)
-.watch(watcher)) { _ =>
-  val createdDriverPod = kubernetesClient.pods().create(resolvedDriverPod)
-  try {
-val otherKubernetesResources =
-  resolvedDriverSpec.driverKubernetesResources ++ Seq(configMap)
-addDriverOwnerReference(createdDriverPod, otherKubernetesResources)
-kubernetesClient.resourceList(otherKubernetesResources: 
_*).createOrReplace()
-  } catch {
-case NonFatal(e) =>
-  kubernetesClient.pods().delete(createdDriverPod)
-  throw e
-  }
 
-  if (waitForAppCompletion) {
-logInfo(s"Waiting for application $appName to finish...")
-watcher.awaitCompletion()
-logInfo(s"Application $appName finished.")
-  } else {
-logInfo(s"Deployed Spark application $appName into Kubernetes.")
+val driverPodName = resolvedDriverPod.getMetadata.getName
+var watch: Watch = null
+val createdDriverPod = kubernetesClient.pods().create(resolvedDriverPod)
+try {
+  val otherKubernetesResources = 
resolvedDriverSpec.driverKubernetesResources ++ Seq(configMap)
+  addDriverOwnerReference(createdDriverPod, otherKubernetesResources)
+  kubernetesClient.resourceList(otherKubernetesResources: 
_*).createOrReplace()
+} catch {
+  case NonFatal(e) =>
+kubernetesClient.pods().delete(createdDriverPod)
+throw e
+}
+val sId = Seq(kubernetesConf.namespace(), driverPodName).mkString(":")
+breakable {
+  while (true) {
+val podWithName = kubernetesClient
+  .pods()
+  .withName(driverPodName)
+
+watcher.reset()
+
+watch = podWithName.watch(watcher)
+

Review comment:
   Shall we add the following comments like the other branches?
   ```
   // Send the latest pod state we know to the watcher to make sure we didn't 
miss anything
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30283:
URL: https://github.com/apache/spark/pull/30283#discussion_r523704122



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
##
@@ -133,29 +136,37 @@ private[spark] class Client(
   .endVolume()
 .endSpec()
   .build()
-Utils.tryWithResource(
-  kubernetesClient
-.pods()
-.withName(resolvedDriverPod.getMetadata.getName)
-.watch(watcher)) { _ =>
-  val createdDriverPod = kubernetesClient.pods().create(resolvedDriverPod)
-  try {
-val otherKubernetesResources =
-  resolvedDriverSpec.driverKubernetesResources ++ Seq(configMap)
-addDriverOwnerReference(createdDriverPod, otherKubernetesResources)
-kubernetesClient.resourceList(otherKubernetesResources: 
_*).createOrReplace()
-  } catch {
-case NonFatal(e) =>
-  kubernetesClient.pods().delete(createdDriverPod)
-  throw e
-  }
 
-  if (waitForAppCompletion) {
-logInfo(s"Waiting for application $appName to finish...")
-watcher.awaitCompletion()
-logInfo(s"Application $appName finished.")
-  } else {
-logInfo(s"Deployed Spark application $appName into Kubernetes.")
+val driverPodName = resolvedDriverPod.getMetadata.getName
+var watch: Watch = null
+val createdDriverPod = kubernetesClient.pods().create(resolvedDriverPod)
+try {
+  val otherKubernetesResources = 
resolvedDriverSpec.driverKubernetesResources ++ Seq(configMap)
+  addDriverOwnerReference(createdDriverPod, otherKubernetesResources)
+  kubernetesClient.resourceList(otherKubernetesResources: 
_*).createOrReplace()
+} catch {
+  case NonFatal(e) =>
+kubernetesClient.pods().delete(createdDriverPod)
+throw e
+}
+val sId = Seq(kubernetesConf.namespace(), driverPodName).mkString(":")
+breakable {
+  while (true) {
+val podWithName = kubernetesClient
+  .pods()
+  .withName(driverPodName)
+
+watcher.reset()
+

Review comment:
   Let's remove this empty line.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30283:
URL: https://github.com/apache/spark/pull/30283#discussion_r523704113



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
##
@@ -133,29 +136,37 @@ private[spark] class Client(
   .endVolume()
 .endSpec()
   .build()
-Utils.tryWithResource(
-  kubernetesClient
-.pods()
-.withName(resolvedDriverPod.getMetadata.getName)
-.watch(watcher)) { _ =>
-  val createdDriverPod = kubernetesClient.pods().create(resolvedDriverPod)
-  try {
-val otherKubernetesResources =
-  resolvedDriverSpec.driverKubernetesResources ++ Seq(configMap)
-addDriverOwnerReference(createdDriverPod, otherKubernetesResources)
-kubernetesClient.resourceList(otherKubernetesResources: 
_*).createOrReplace()
-  } catch {
-case NonFatal(e) =>
-  kubernetesClient.pods().delete(createdDriverPod)
-  throw e
-  }
 
-  if (waitForAppCompletion) {
-logInfo(s"Waiting for application $appName to finish...")
-watcher.awaitCompletion()
-logInfo(s"Application $appName finished.")
-  } else {
-logInfo(s"Deployed Spark application $appName into Kubernetes.")
+val driverPodName = resolvedDriverPod.getMetadata.getName
+var watch: Watch = null
+val createdDriverPod = kubernetesClient.pods().create(resolvedDriverPod)
+try {
+  val otherKubernetesResources = 
resolvedDriverSpec.driverKubernetesResources ++ Seq(configMap)
+  addDriverOwnerReference(createdDriverPod, otherKubernetesResources)
+  kubernetesClient.resourceList(otherKubernetesResources: 
_*).createOrReplace()
+} catch {
+  case NonFatal(e) =>
+kubernetesClient.pods().delete(createdDriverPod)
+throw e
+}
+val sId = Seq(kubernetesConf.namespace(), driverPodName).mkString(":")
+breakable {
+  while (true) {
+val podWithName = kubernetesClient
+  .pods()
+  .withName(driverPodName)
+

Review comment:
   Could you add the following comments like the other branches?
   ```
   // Reset resource to old before we start the watch, this is important for 
race conditions
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-14 Thread GitBox


dongjoon-hyun commented on pull request #30283:
URL: https://github.com/apache/spark/pull/30283#issuecomment-727425783


   Retest this please.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727420586


   Thanks. No need to hurry~ Take your time. I just want to give you feedback 
swiftly from my side. This PR will be here until next week.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-14 Thread GitBox


dongjoon-hyun commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727420586


   No need to hurry~ Take your time. I just want to give you feedback swiftly 
from my side.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-14 Thread GitBox


SparkQA commented on pull request #30368:
URL: https://github.com/apache/spark/pull/30368#issuecomment-727381008


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35707/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-14 Thread GitBox


viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727364039


   @dongjoon-hyun Thanks! I will update this tonight or tomorrow.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

2020-11-14 Thread GitBox


AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727348202







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

2020-11-14 Thread GitBox


SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727348157


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35706/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

2020-11-14 Thread GitBox


AmplabJenkins commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727348202







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-14 Thread GitBox


dongjoon-hyun commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727331477


   #30379 is merged. Could you rebased to the master and regenerate the result, 
@viirya .
   
   Sure, that's the exact reason I asked their opinion once more.  
   > Let's wait for @cloud-fan's comment for #30341 (comment) after this 
weekend.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


dongjoon-hyun closed pull request #30379:
URL: https://github.com/apache/spark/pull/30379


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


dongjoon-hyun commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727329518


   Since the last commit is only for commenting and this PR already passed, I 
merged this. Thanks~



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


viirya commented on a change in pull request #30379:
URL: https://github.com/apache/spark/pull/30379#discussion_r523676597



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SubExprEliminationBenchmark.scala
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * The benchmarks aims to measure performance of the queries where there are 
subexpression
+ * elimination or not.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class  --jars ,
+ * 
+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result:
+ *  SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain "
+ *  Results will be written to 
"benchmarks/SubExprEliminationBenchmark-results.txt".
+ * }}}
+ */
+object SubExprEliminationBenchmark extends SqlBasedBenchmark {
+  import spark.implicits._
+
+  def withFromJson(rowsNum: Int, numIters: Int): Unit = {
+val benchmark = new Benchmark("from_json as subExpr", rowsNum, output = 
output)
+
+withTempPath { path =>
+  prepareDataInfo(benchmark)
+  val numCols = 1000
+  val schema = writeWideRow(path.getAbsolutePath, rowsNum, numCols)
+
+  val cols = (0 until numCols).map { idx =>
+from_json('value, schema).getField(s"col$idx")
+  }
+
+  benchmark.addCase("subexpressionElimination off, codegen on", numIters) 
{ _ =>
+withSQLConf(
+  SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "false",
+  SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
+  SQLConf.CODEGEN_FACTORY_MODE.key -> "CODEGEN_ONLY",
+  SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+  val df = spark.read
+.text(path.getAbsolutePath)
+.select(cols: _*)
+  df.collect()
+}
+  }
+
+  benchmark.addCase("subexpressionElimination off, codegen off", numIters) 
{ _ =>
+withSQLConf(
+  SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "false",
+  SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false",
+  SQLConf.CODEGEN_FACTORY_MODE.key -> "NO_CODEGEN",
+  SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+  val df = spark.read
+.text(path.getAbsolutePath)
+.select(cols: _*)
+  df.collect()
+}
+  }
+
+  // We only benchmark subexpression performance under 
codegen/non-codegen, so disabling
+  // json optimization.

Review comment:
   Ah right, forgot it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-14 Thread GitBox


viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727325324


   Thanks @dongjoon-hyun.
   
   Let's wait for @cloud-fan's comment for 
https://github.com/apache/spark/pull/30341#discussion_r522822589 after this 
weekend.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-14 Thread GitBox


SparkQA commented on pull request #30368:
URL: https://github.com/apache/spark/pull/30368#issuecomment-727323154


   **[Test build #131104 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131104/testReport)**
 for PR 30368 at commit 
[`ff1ab01`](https://github.com/apache/spark/commit/ff1ab01888f13f829c78e6d4da53b94c2adc6d16).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30379:
URL: https://github.com/apache/spark/pull/30379#discussion_r523672856



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SubExprEliminationBenchmark.scala
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * The benchmarks aims to measure performance of the queries where there are 
subexpression
+ * elimination or not.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class  --jars ,
+ * 
+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result:
+ *  SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain "
+ *  Results will be written to 
"benchmarks/SubExprEliminationBenchmark-results.txt".
+ * }}}
+ */
+object SubExprEliminationBenchmark extends SqlBasedBenchmark {
+  import spark.implicits._
+
+  def withFromJson(rowsNum: Int, numIters: Int): Unit = {
+val benchmark = new Benchmark("from_json as subExpr", rowsNum, output = 
output)
+
+withTempPath { path =>
+  prepareDataInfo(benchmark)
+  val numCols = 1000
+  val schema = writeWideRow(path.getAbsolutePath, rowsNum, numCols)
+
+  val cols = (0 until numCols).map { idx =>
+from_json('value, schema).getField(s"col$idx")
+  }
+
+  benchmark.addCase("subexpressionElimination off, codegen on", numIters) 
{ _ =>
+withSQLConf(
+  SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "false",
+  SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
+  SQLConf.CODEGEN_FACTORY_MODE.key -> "CODEGEN_ONLY",
+  SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+  val df = spark.read
+.text(path.getAbsolutePath)
+.select(cols: _*)
+  df.collect()
+}
+  }
+
+  benchmark.addCase("subexpressionElimination off, codegen off", numIters) 
{ _ =>
+withSQLConf(
+  SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "false",
+  SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false",
+  SQLConf.CODEGEN_FACTORY_MODE.key -> "NO_CODEGEN",
+  SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+  val df = spark.read
+.text(path.getAbsolutePath)
+.select(cols: _*)
+  df.collect()
+}
+  }
+
+  // We only benchmark subexpression performance under 
codegen/non-codegen, so disabling
+  // json optimization.

Review comment:
   Oh, this seems to be moved together to line 52.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30379:
URL: https://github.com/apache/spark/pull/30379#discussion_r523672856



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SubExprEliminationBenchmark.scala
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * The benchmarks aims to measure performance of the queries where there are 
subexpression
+ * elimination or not.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class  --jars ,
+ * 
+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result:
+ *  SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain "
+ *  Results will be written to 
"benchmarks/SubExprEliminationBenchmark-results.txt".
+ * }}}
+ */
+object SubExprEliminationBenchmark extends SqlBasedBenchmark {
+  import spark.implicits._
+
+  def withFromJson(rowsNum: Int, numIters: Int): Unit = {
+val benchmark = new Benchmark("from_json as subExpr", rowsNum, output = 
output)
+
+withTempPath { path =>
+  prepareDataInfo(benchmark)
+  val numCols = 1000
+  val schema = writeWideRow(path.getAbsolutePath, rowsNum, numCols)
+
+  val cols = (0 until numCols).map { idx =>
+from_json('value, schema).getField(s"col$idx")
+  }
+
+  benchmark.addCase("subexpressionElimination off, codegen on", numIters) 
{ _ =>
+withSQLConf(
+  SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "false",
+  SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
+  SQLConf.CODEGEN_FACTORY_MODE.key -> "CODEGEN_ONLY",
+  SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+  val df = spark.read
+.text(path.getAbsolutePath)
+.select(cols: _*)
+  df.collect()
+}
+  }
+
+  benchmark.addCase("subexpressionElimination off, codegen off", numIters) 
{ _ =>
+withSQLConf(
+  SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "false",
+  SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false",
+  SQLConf.CODEGEN_FACTORY_MODE.key -> "NO_CODEGEN",
+  SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+  val df = spark.read
+.text(path.getAbsolutePath)
+.select(cols: _*)
+  df.collect()
+}
+  }
+
+  // We only benchmark subexpression performance under 
codegen/non-codegen, so disabling
+  // json optimization.

Review comment:
   Oh, it seems that we need to move this comment block to line 52 because 
it's a global comment for all four run.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30379:
URL: https://github.com/apache/spark/pull/30379#discussion_r523672856



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SubExprEliminationBenchmark.scala
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * The benchmarks aims to measure performance of the queries where there are 
subexpression
+ * elimination or not.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class  --jars ,
+ * 
+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result:
+ *  SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain "
+ *  Results will be written to 
"benchmarks/SubExprEliminationBenchmark-results.txt".
+ * }}}
+ */
+object SubExprEliminationBenchmark extends SqlBasedBenchmark {
+  import spark.implicits._
+
+  def withFromJson(rowsNum: Int, numIters: Int): Unit = {
+val benchmark = new Benchmark("from_json as subExpr", rowsNum, output = 
output)
+
+withTempPath { path =>
+  prepareDataInfo(benchmark)
+  val numCols = 1000
+  val schema = writeWideRow(path.getAbsolutePath, rowsNum, numCols)
+
+  val cols = (0 until numCols).map { idx =>
+from_json('value, schema).getField(s"col$idx")
+  }
+
+  benchmark.addCase("subexpressionElimination off, codegen on", numIters) 
{ _ =>
+withSQLConf(
+  SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "false",
+  SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
+  SQLConf.CODEGEN_FACTORY_MODE.key -> "CODEGEN_ONLY",
+  SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+  val df = spark.read
+.text(path.getAbsolutePath)
+.select(cols: _*)
+  df.collect()
+}
+  }
+
+  benchmark.addCase("subexpressionElimination off, codegen off", numIters) 
{ _ =>
+withSQLConf(
+  SQLConf.SUBEXPRESSION_ELIMINATION_ENABLED.key -> "false",
+  SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false",
+  SQLConf.CODEGEN_FACTORY_MODE.key -> "NO_CODEGEN",
+  SQLConf.JSON_EXPRESSION_OPTIMIZATION.key -> "false") {
+  val df = spark.read
+.text(path.getAbsolutePath)
+.select(cols: _*)
+  df.collect()
+}
+  }
+
+  // We only benchmark subexpression performance under 
codegen/non-codegen, so disabling
+  // json optimization.

Review comment:
   Oh, it seems that we need to move this comment block to line 52.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #30379:
URL: https://github.com/apache/spark/pull/30379#discussion_r523670660



##
File path: sql/core/benchmarks/SubExprEliminationBenchmark-jdk11-results.txt
##
@@ -0,0 +1,15 @@
+
+Benchmark for performance of subexpression elimination
+
+
+Preparing data for benchmarking ...
+OpenJDK 64-Bit Server VM 11.0.9+11 on Mac OS X 10.15.6
+Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
+from_json as subExpr:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+-
+subexpressionElimination off, codegen on   26809  27731
 898  0.0   268094225.4   1.0X
+subexpressionElimination off, codegen off  25117  26612
1357  0.0   251166638.4   1.1X
+subexpressionElimination on, codegen on 2582   2906
 282  0.025819408.7  10.4X

Review comment:
   Wow. It's faster in Java11?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

2020-11-14 Thread GitBox


SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727308891


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35706/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a change in pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-14 Thread GitBox


ulysses-you commented on a change in pull request #30368:
URL: https://github.com/apache/spark/pull/30368#discussion_r523666452



##
File path: 
sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q92/explain.txt
##
@@ -1,5 +1,5 @@
 == Physical Plan ==
-TakeOrderedAndProject (34)
+* Sort (34)

Review comment:
   Other thought, we can add this pattern
   ```
   case sort @ Sort(order, true, child)
 if sort.maxRow < conf.topKSortFallbackThreshold =>
   TakeOrderedAndProjectExec()
   ```
   
   In this way, we can infer the `TakeOrderedAndProjectExec` from `Sort` which 
has not `Limit` after.
   
   What do you think about this? @maropu @viirya @cloud-fan 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a change in pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-14 Thread GitBox


ulysses-you commented on a change in pull request #30368:
URL: https://github.com/apache/spark/pull/30368#discussion_r523666452



##
File path: 
sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q92/explain.txt
##
@@ -1,5 +1,5 @@
 == Physical Plan ==
-TakeOrderedAndProject (34)
+* Sort (34)

Review comment:
   Other thought, we can add this pattern
   ```
   case sort @ Sort(order, true, child)
 if sort.maxRow < conf.topKSortFallbackThreshold =>
   TakeOrderedAndProjectExec()
   ```
   
   In this way, we can infer the `TakeOrderedAndProjectExec` from `Sort` which 
has not `Limit` after.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a change in pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-14 Thread GitBox


ulysses-you commented on a change in pull request #30368:
URL: https://github.com/apache/spark/pull/30368#discussion_r523656874



##
File path: 
sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q92/explain.txt
##
@@ -1,5 +1,5 @@
 == Physical Plan ==
-TakeOrderedAndProject (34)
+* Sort (34)

Review comment:
   This q92 sql:
   ```
   SELECT sum(ws_ext_discount_amt) AS `Excess Discount Amount `
   FROM web_sales, item, date_dim
   WHERE i_manufact_id = 350
 AND i_item_sk = ws_item_sk
 AND d_date BETWEEN '2000-01-27' AND (cast('2000-01-27' AS DATE) + INTERVAL 
90 days)
 AND d_date_sk = ws_sold_date_sk
 AND ws_ext_discount_amt >
 (
   SELECT 1.3 * avg(ws_ext_discount_amt)
   FROM web_sales, date_dim
   WHERE ws_item_sk = i_item_sk
 AND d_date BETWEEN '2000-01-27' AND (cast('2000-01-27' AS DATE) + 
INTERVAL 90 days)
 AND d_date_sk = ws_sold_date_sk
 )
   ORDER BY sum(ws_ext_discount_amt)
   LIMIT 100
   ```
   
   yes, `Limit` after `Sort` is a special case, we will convert to 
`TakeOrderedAndProject`, but it seems not necessary to do both `sort` and 
`limit` if child maxRow == 1. Maybe we can do an another check seems like `if 
sort.child.maxRow <= 1 then remove sort` ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

2020-11-14 Thread GitBox


AmplabJenkins removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727297309







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

2020-11-14 Thread GitBox


AmplabJenkins commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727297309







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

2020-11-14 Thread GitBox


SparkQA removed a comment on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727295568


   **[Test build #131103 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131103/testReport)**
 for PR 30346 at commit 
[`b36cfd4`](https://github.com/apache/spark/commit/b36cfd410e444501614f505380daf3d42926dbce).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

2020-11-14 Thread GitBox


SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727297266


   **[Test build #131103 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131103/testReport)**
 for PR 30346 at commit 
[`b36cfd4`](https://github.com/apache/spark/commit/b36cfd410e444501614f505380daf3d42926dbce).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30346: [SPARK-33253][PYTHON][DOCS] Migration to NumPy documentation style in Streaming (pyspark.streaming.*)

2020-11-14 Thread GitBox


SparkQA commented on pull request #30346:
URL: https://github.com/apache/spark/pull/30346#issuecomment-727295568


   **[Test build #131103 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131103/testReport)**
 for PR 30346 at commit 
[`b36cfd4`](https://github.com/apache/spark/commit/b36cfd410e444501614f505380daf3d42926dbce).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


AmplabJenkins removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727294448







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


AmplabJenkins commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727294448







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727294445


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35705/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


AmplabJenkins removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727293839


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131099/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


AmplabJenkins removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727293087







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


SparkQA removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727271125


   **[Test build #131099 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131099/testReport)**
 for PR 30379 at commit 
[`90ded6b`](https://github.com/apache/spark/commit/90ded6b35a6db1232f389073789083834a335574).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


AmplabJenkins commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727293836







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727293715


   **[Test build #131099 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131099/testReport)**
 for PR 30379 at commit 
[`90ded6b`](https://github.com/apache/spark/commit/90ded6b35a6db1232f389073789083834a335574).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a change in pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-14 Thread GitBox


ulysses-you commented on a change in pull request #30368:
URL: https://github.com/apache/spark/pull/30368#discussion_r523596769



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##
@@ -1452,11 +1452,21 @@ object PushPredicateThroughJoin extends 
Rule[LogicalPlan] with PredicateHelper {
 }
 
 /**
- * Combines two adjacent [[Limit]] operators into one, merging the
- * expressions into one single expression.
+ * 1. Eliminate [[Limit]] operators if it's child max row <= limit.
+ * 2. Combines two adjacent [[Limit]] operators into one, merging the
+ *expressions into one single expression.
  */
-object CombineLimits extends Rule[LogicalPlan] {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+object EliminateLimits extends Rule[LogicalPlan] {
+  private def canEliminate(limitExpr: Expression, childMaxRow: Option[Long]): 
Boolean = {
+limitExpr.foldable && childMaxRow.exists { _ <= 
limitExpr.eval().toString.toLong }

Review comment:
   ah, thanks for the night review.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


AmplabJenkins commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727293087







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727293084


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35704/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727292169


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35705/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727291002


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35704/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


AmplabJenkins removed a comment on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727290920







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


AmplabJenkins commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727290920







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727290918


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35703/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727288272


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35703/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30379: [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread GitBox


SparkQA commented on pull request #30379:
URL: https://github.com/apache/spark/pull/30379#issuecomment-727287912


   **[Test build #131102 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131102/testReport)**
 for PR 30379 at commit 
[`f9ef4ea`](https://github.com/apache/spark/commit/f9ef4eadb84afd0802f0d3284e6109528aca269d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >