[GitHub] [spark] holdenk opened a new pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

2020-10-26 Thread GitBox


holdenk opened a new pull request #30155:
URL: https://github.com/apache/spark/pull/30155


   
   ### What changes were proposed in this pull request?
   
   Make pod allocation executor timeouts configurable. Keep all known pods in 
mind when allocating executors to avoid over allocating if the pending time is 
much higher than the allocation interval.
   
   ### Why are the changes needed?
   The current executor timeouts do not match that of all real world clusters 
especially under load. While this can be worked around by increasing the 
allocation batch delay, that will decrease the speed at which the total number 
of executors will be able to be requested.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes new configuration property
   
   
   ### How was this patch tested?
   
   Updated existing test to use the timeout from the new configuration 
property. Verified test failed without the update.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

2020-10-26 Thread GitBox


AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716870239







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

2020-10-26 Thread GitBox


AmplabJenkins commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716870239







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

2020-10-26 Thread GitBox


SparkQA removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716748700


   **[Test build #130296 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130296/testReport)**
 for PR 30093 at commit 
[`59f9cd4`](https://github.com/apache/spark/commit/59f9cd4be7f7c93f8cf01c66dfb5619af548e66b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

2020-10-26 Thread GitBox


SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716869601


   **[Test build #130296 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130296/testReport)**
 for PR 30093 at commit 
[`59f9cd4`](https://github.com/apache/spark/commit/59f9cd4be7f7c93f8cf01c66dfb5619af548e66b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao opened a new pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog

2020-10-26 Thread GitBox


huaxingao opened a new pull request #30154:
URL: https://github.com/apache/spark/pull/30154


   
   ### What changes were proposed in this pull request?
   Currently in JDBCTableCatalog, we ignore the table options when creating 
table.
   ```
   // TODO (SPARK-32405): Apply table options while creating tables in JDBC 
Table Catalog
   if (!properties.isEmpty) {
 logWarning("Cannot create JDBC table with properties, these properties 
will be " +
   "ignored: " + properties.asScala.map { case (k, v) => s"$k=$v" 
}.mkString("[", ", ", "]"))
   }
   ```
   
   ### Why are the changes needed?
   need to apply the table options when we create table
   
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   
   ### How was this patch tested?
   add new test
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30130: [WIP][SPARK-32354][K8S][TESTS] Re-enable RTestsSuite

2020-10-26 Thread GitBox


SparkQA removed a comment on pull request #30130:
URL: https://github.com/apache/spark/pull/30130#issuecomment-716860126


   **[Test build #130299 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130299/testReport)**
 for PR 30130 at commit 
[`01666ef`](https://github.com/apache/spark/commit/01666ef027ff48e4af567ad29dd6fa8655cf5bea).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30130: [WIP][SPARK-32354][K8S][TESTS] Re-enable RTestsSuite

2020-10-26 Thread GitBox


AmplabJenkins removed a comment on pull request #30130:
URL: https://github.com/apache/spark/pull/30130#issuecomment-716865122







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30130: [WIP][SPARK-32354][K8S][TESTS] Re-enable RTestsSuite

2020-10-26 Thread GitBox


AmplabJenkins commented on pull request #30130:
URL: https://github.com/apache/spark/pull/30130#issuecomment-716865122







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30130: [WIP][SPARK-32354][K8S][TESTS] Re-enable RTestsSuite

2020-10-26 Thread GitBox


SparkQA commented on pull request #30130:
URL: https://github.com/apache/spark/pull/30130#issuecomment-716864965


   **[Test build #130299 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130299/testReport)**
 for PR 30130 at commit 
[`01666ef`](https://github.com/apache/spark/commit/01666ef027ff48e4af567ad29dd6fa8655cf5bea).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-10-26 Thread GitBox


HeartSaVioR commented on a change in pull request #26935:
URL: https://github.com/apache/spark/pull/26935#discussion_r512313839



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
##
@@ -89,16 +116,16 @@ trait StateStore {
   def commit(): Long
 
   /**
-   * Abort all the updates that have been made to the store. Implementations 
should ensure that
-   * no more updates (puts, removes) can be after an abort in order to avoid 
incorrect usage.
+   * Return an iterator containing all the key-value pairs in the StateStore. 
Implementations must
+   * ensure that updates (puts, removes) can be made while iterating over this 
iterator.
*/
-  def abort(): Unit
+  override def iterator(): Iterator[UnsafeRowPair]

Review comment:
   Yes. Unfortunately we've figured out the requirements of `iterator` 
implementation is a bit different between the two. That said, someone could 
leverage the fact that iterator in ReadOnlyState doesn't need to care about 
updates.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-10-26 Thread GitBox


HeartSaVioR commented on a change in pull request #26935:
URL: https://github.com/apache/spark/pull/26935#discussion_r512312263



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
##
@@ -81,6 +74,40 @@ trait StateStore {
 iterator()
   }
 
+  /** Return an iterator containing all the key-value pairs in the StateStore. 
*/
+  def iterator(): Iterator[UnsafeRowPair]
+
+  /**
+   * Clean up the resource.
+   *
+   * The method name is to respect backward compatibility on [[StateStore]].

Review comment:
   `abort` is only the one which is not properly named in point of 
ReadStateStore. It should be `close`, but we found it tricky to respect 
backward compatibility with having `close`, hence we decided to leave it as it 
is. 
   
   Please refer https://github.com/apache/spark/pull/26935#discussion_r493236101





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29970: [SPARK-33087][SQL] DataFrameWriterV2 should delegate table resolution to the analyzer

2020-10-26 Thread GitBox


SparkQA commented on pull request #29970:
URL: https://github.com/apache/spark/pull/29970#issuecomment-716862852


   **[Test build #130300 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130300/testReport)**
 for PR 29970 at commit 
[`ee3587a`](https://github.com/apache/spark/commit/ee3587ad1a1654ea89b1064d3407052b66cabe2e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #29970: [SPARK-33087][SQL] DataFrameWriterV2 should delegate table resolution to the analyzer

2020-10-26 Thread GitBox


HeartSaVioR commented on pull request #29970:
URL: https://github.com/apache/spark/pull/29970#issuecomment-716861018


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog

2020-10-26 Thread GitBox


HeartSaVioR commented on pull request #30147:
URL: https://github.com/apache/spark/pull/30147#issuecomment-716860727


   cc. @cloud-fan @rdblue 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30130: [WIP][SPARK-32354][K8S][TESTS] Re-enable RTestsSuite

2020-10-26 Thread GitBox


SparkQA commented on pull request #30130:
URL: https://github.com/apache/spark/pull/30130#issuecomment-716860126


   **[Test build #130299 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130299/testReport)**
 for PR 30130 at commit 
[`01666ef`](https://github.com/apache/spark/commit/01666ef027ff48e4af567ad29dd6fa8655cf5bea).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30130: [WIP][SPARK-32354][K8S][TESTS] Re-enable RTestsSuite

2020-10-26 Thread GitBox


dongjoon-hyun commented on pull request #30130:
URL: https://github.com/apache/spark/pull/30130#issuecomment-716859380


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


dongjoon-hyun closed pull request #30153:
URL: https://github.com/apache/spark/pull/30153


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


dongjoon-hyun commented on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716858640


   Thank you, @viirya !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-26 Thread GitBox


otterc commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r512300825



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,899 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  private static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap 
partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final String relativeMergeDirPathPattern;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf, String 
relativeMergeDirPathPattern) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+  // Add `spark` prefix because it will run in NM in Yarn mode.
+  
NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner"));
+this.minChunkSize = conf.minChunkSizeInMergedShuffleFile();
+CacheLoader indexCacheLoader =
+  new CacheLoader() {
+public ShuffleIndexInformation load(File file) throws IOException {
+  return new ShuffleIndexInformation(file);
+}
+  };
+indexCache = CacheBuilder.newBuilder()
+  .maximumWeight(conf.mergedIndexCacheSize())
+  .weigher((Weigher) (file, indexInfo) -> 
indexInfo.getSize())
+  .build(indexCacheLoader);
+this.relativeMergeDirPathPattern = relativeMergeDirPathPattern;
+this.errorHandler = new ErrorHandler.BlockPushErrorHandler();
+  }
+
+  /**
+   * Given an ID that uniquely identifies a given shuffle partition of an 
application, retrieves the
+   * associated metadata. If not present and the 

[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-26 Thread GitBox


otterc commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r512298080



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,905 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  private static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap> partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final String relativeMergeDirPathPattern;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf, String 
relativeMergeDirPathPattern) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+  // Add `spark` prefix because it will run in NM in Yarn mode.
+  
NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner"));
+this.minChunkSize = conf.minChunkSizeInMergedShuffleFile();
+CacheLoader indexCacheLoader =
+  new CacheLoader() {
+public ShuffleIndexInformation load(File file) throws IOException {
+  return new ShuffleIndexInformation(file);
+}
+  };
+indexCache = CacheBuilder.newBuilder()
+  .maximumWeight(conf.mergedIndexCacheSize())
+  .weigher((Weigher) (file, indexInfo) -> 
indexInfo.getSize())
+  .build(indexCacheLoader);
+this.relativeMergeDirPathPattern = relativeMergeDirPathPattern;
+this.errorHandler = new ErrorHandler.BlockPushErrorHandler();
+  }
+
+  /**
+   * Given the appShuffleId and reduceId that uniquely identifies a given 
shuffle partition of an
+   * application, retrieves the associated metadata. 

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


AmplabJenkins removed a comment on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716850189







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


dongjoon-hyun commented on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716850248


   Hi, @holdenk and @viirya and @srowen .
   Could you review this PR, please?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


AmplabJenkins commented on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716850189







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


SparkQA commented on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716850170


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34899/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #30140: [SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan

2020-10-26 Thread GitBox


maropu commented on pull request #30140:
URL: https://github.com/apache/spark/pull/30140#issuecomment-716846073


   okay, I will today.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-26 Thread GitBox


tgravescs commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r512281688



##
File path: 
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##
@@ -94,6 +95,9 @@
   static final String STOP_ON_FAILURE_KEY = "spark.yarn.shuffle.stopOnFailure";
   private static final boolean DEFAULT_STOP_ON_FAILURE = false;
 
+  // Used by shuffle merge manager to create merged shuffle files.
+  protected static final String APP_BASE_RELATIVE_PATH = 
"usercache/%s/appcache/%s/";

Review comment:
   yeah, it does feel a bit odd here to hard code this here.   the code 
inside RemoteBlockPushResolver assumes it has a specific format of with the 
user and appid as well. I don't have any other great solution to get this from 
yarn, but I think it should be documented what the %s values are and assumed to 
be when passed into the class.

##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,905 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  private static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap> partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final String relativeMergeDirPathPattern;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf, String 
relativeMergeDirPathPattern) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+  // Add `spark` prefix because it will run in NM in Yarn mode.
+  

[GitHub] [spark] SparkQA commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


SparkQA commented on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716841256


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34899/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-26 Thread GitBox


otterc commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r512285083



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,905 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  private static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap> partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final String relativeMergeDirPathPattern;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf, String 
relativeMergeDirPathPattern) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+  // Add `spark` prefix because it will run in NM in Yarn mode.
+  
NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner"));
+this.minChunkSize = conf.minChunkSizeInMergedShuffleFile();
+CacheLoader indexCacheLoader =
+  new CacheLoader() {
+public ShuffleIndexInformation load(File file) throws IOException {
+  return new ShuffleIndexInformation(file);
+}
+  };
+indexCache = CacheBuilder.newBuilder()
+  .maximumWeight(conf.mergedIndexCacheSize())
+  .weigher((Weigher) (file, indexInfo) -> 
indexInfo.getSize())
+  .build(indexCacheLoader);
+this.relativeMergeDirPathPattern = relativeMergeDirPathPattern;
+this.errorHandler = new ErrorHandler.BlockPushErrorHandler();
+  }
+
+  /**
+   * Given the appShuffleId and reduceId that uniquely identifies a given 
shuffle partition of an
+   * application, retrieves the associated metadata. 

[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-26 Thread GitBox


otterc commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r512280774



##
File path: 
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##
@@ -172,7 +178,9 @@ protected void serviceInit(Configuration conf) throws 
Exception {
   }
 
   TransportConf transportConf = new TransportConf("shuffle", new 
HadoopConfigProvider(conf));
-  blockHandler = new ExternalBlockHandler(transportConf, 
registeredExecutorFile);
+  shuffleMergeManager = new RemoteBlockPushResolver(transportConf, 
APP_BASE_RELATIVE_PATH);

Review comment:
   Yes, doesn't seem to be a big change to have a config that is a class. I 
will add that with this change. 
   @tgravescs 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-26 Thread GitBox


otterc commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r512279242



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,905 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  private static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap> partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final String relativeMergeDirPathPattern;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf, String 
relativeMergeDirPathPattern) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+  // Add `spark` prefix because it will run in NM in Yarn mode.
+  
NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner"));
+this.minChunkSize = conf.minChunkSizeInMergedShuffleFile();
+CacheLoader indexCacheLoader =
+  new CacheLoader() {
+public ShuffleIndexInformation load(File file) throws IOException {
+  return new ShuffleIndexInformation(file);
+}
+  };
+indexCache = CacheBuilder.newBuilder()
+  .maximumWeight(conf.mergedIndexCacheSize())
+  .weigher((Weigher) (file, indexInfo) -> 
indexInfo.getSize())
+  .build(indexCacheLoader);
+this.relativeMergeDirPathPattern = relativeMergeDirPathPattern;
+this.errorHandler = new ErrorHandler.BlockPushErrorHandler();
+  }
+
+  /**
+   * Given the appShuffleId and reduceId that uniquely identifies a given 
shuffle partition of an
+   * application, retrieves the associated metadata. 

[GitHub] [spark] SparkQA commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


SparkQA commented on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716825697


   **[Test build #130298 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130298/testReport)**
 for PR 30153 at commit 
[`d48174e`](https://github.com/apache/spark/commit/d48174e21109be167b6fde80b8ee8826eccc000f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


SparkQA removed a comment on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716820189


   **[Test build #130298 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130298/testReport)**
 for PR 30153 at commit 
[`d48174e`](https://github.com/apache/spark/commit/d48174e21109be167b6fde80b8ee8826eccc000f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


AmplabJenkins commented on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716825889







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


AmplabJenkins removed a comment on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716825889







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


SparkQA commented on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716820189


   **[Test build #130298 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130298/testReport)**
 for PR 30153 at commit 
[`d48174e`](https://github.com/apache/spark/commit/d48174e21109be167b6fde80b8ee8826eccc000f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


dongjoon-hyun commented on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716817713


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Victsm commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-26 Thread GitBox


Victsm commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r512259560



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,905 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  private static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap> partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final String relativeMergeDirPathPattern;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf, String 
relativeMergeDirPathPattern) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+  // Add `spark` prefix because it will run in NM in Yarn mode.
+  
NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner"));
+this.minChunkSize = conf.minChunkSizeInMergedShuffleFile();
+CacheLoader indexCacheLoader =
+  new CacheLoader() {
+public ShuffleIndexInformation load(File file) throws IOException {
+  return new ShuffleIndexInformation(file);
+}
+  };
+indexCache = CacheBuilder.newBuilder()
+  .maximumWeight(conf.mergedIndexCacheSize())
+  .weigher((Weigher) (file, indexInfo) -> 
indexInfo.getSize())
+  .build(indexCacheLoader);
+this.relativeMergeDirPathPattern = relativeMergeDirPathPattern;
+this.errorHandler = new ErrorHandler.BlockPushErrorHandler();
+  }
+
+  /**
+   * Given the appShuffleId and reduceId that uniquely identifies a given 
shuffle partition of an
+   * application, retrieves the associated metadata. 

[GitHub] [spark] zero323 commented on pull request #30150: [SPARK-32188][PYTHON][DOCS][FOLLOW-UP] Document Column APIs in API reference

2020-10-26 Thread GitBox


zero323 commented on pull request #30150:
URL: https://github.com/apache/spark/pull/30150#issuecomment-716807149


   LGTM.
   
   On a side note, it highlights how sparse are `Column` docs (and that we miss 
even basic examples for dunder methods). Do we have any ticket to track that?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #30131: [SPARK-33220][CORE]Use `scheduleWithFixedDelay` to avoid repeated unnecessary scheduling for a short time

2020-10-26 Thread GitBox


mridulm commented on pull request #30131:
URL: https://github.com/apache/spark/pull/30131#issuecomment-716804124


   +CC @otterc 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


AmplabJenkins removed a comment on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716803034


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34898/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


AmplabJenkins removed a comment on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716803022


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


SparkQA commented on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716803000


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34898/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


AmplabJenkins commented on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716803022







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-26 Thread GitBox


tgravescs commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r512238046



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,905 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  private static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap> partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final String relativeMergeDirPathPattern;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf, String 
relativeMergeDirPathPattern) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+  // Add `spark` prefix because it will run in NM in Yarn mode.
+  
NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner"));
+this.minChunkSize = conf.minChunkSizeInMergedShuffleFile();
+CacheLoader indexCacheLoader =
+  new CacheLoader() {
+public ShuffleIndexInformation load(File file) throws IOException {
+  return new ShuffleIndexInformation(file);
+}
+  };
+indexCache = CacheBuilder.newBuilder()
+  .maximumWeight(conf.mergedIndexCacheSize())
+  .weigher((Weigher) (file, indexInfo) -> 
indexInfo.getSize())
+  .build(indexCacheLoader);
+this.relativeMergeDirPathPattern = relativeMergeDirPathPattern;
+this.errorHandler = new ErrorHandler.BlockPushErrorHandler();
+  }
+
+  /**
+   * Given the appShuffleId and reduceId that uniquely identifies a given 
shuffle partition of an
+   * application, retrieves the associated 

[GitHub] [spark] SparkQA commented on pull request #30152: Revert "[SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan

2020-10-26 Thread GitBox


SparkQA commented on pull request #30152:
URL: https://github.com/apache/spark/pull/30152#issuecomment-716787342


   **[Test build #130295 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130295/testReport)**
 for PR 30152 at commit 
[`0ae050b`](https://github.com/apache/spark/commit/0ae050b727572c2ba39b1269883bb177fc5e7838).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30152: Revert "[SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan

2020-10-26 Thread GitBox


AmplabJenkins commented on pull request #30152:
URL: https://github.com/apache/spark/pull/30152#issuecomment-716788005







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30152: Revert "[SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan

2020-10-26 Thread GitBox


SparkQA removed a comment on pull request #30152:
URL: https://github.com/apache/spark/pull/30152#issuecomment-716677497


   **[Test build #130295 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130295/testReport)**
 for PR 30152 at commit 
[`0ae050b`](https://github.com/apache/spark/commit/0ae050b727572c2ba39b1269883bb177fc5e7838).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-26 Thread GitBox


tgravescs commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r512232873



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,905 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  private static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap> partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final String relativeMergeDirPathPattern;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf, String 
relativeMergeDirPathPattern) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+  // Add `spark` prefix because it will run in NM in Yarn mode.
+  
NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner"));
+this.minChunkSize = conf.minChunkSizeInMergedShuffleFile();
+CacheLoader indexCacheLoader =
+  new CacheLoader() {
+public ShuffleIndexInformation load(File file) throws IOException {
+  return new ShuffleIndexInformation(file);
+}
+  };
+indexCache = CacheBuilder.newBuilder()
+  .maximumWeight(conf.mergedIndexCacheSize())
+  .weigher((Weigher) (file, indexInfo) -> 
indexInfo.getSize())
+  .build(indexCacheLoader);
+this.relativeMergeDirPathPattern = relativeMergeDirPathPattern;
+this.errorHandler = new ErrorHandler.BlockPushErrorHandler();
+  }
+
+  /**
+   * Given the appShuffleId and reduceId that uniquely identifies a given 
shuffle partition of an
+   * application, retrieves the associated 

[GitHub] [spark] SparkQA commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


SparkQA commented on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716789761


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34898/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30152: Revert "[SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan

2020-10-26 Thread GitBox


AmplabJenkins removed a comment on pull request #30152:
URL: https://github.com/apache/spark/pull/30152#issuecomment-716788005







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-26 Thread GitBox


otterc commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r512229942



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,899 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  private static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap 
partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final String relativeMergeDirPathPattern;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf, String 
relativeMergeDirPathPattern) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+  // Add `spark` prefix because it will run in NM in Yarn mode.
+  
NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner"));
+this.minChunkSize = conf.minChunkSizeInMergedShuffleFile();
+CacheLoader indexCacheLoader =
+  new CacheLoader() {
+public ShuffleIndexInformation load(File file) throws IOException {
+  return new ShuffleIndexInformation(file);
+}
+  };
+indexCache = CacheBuilder.newBuilder()
+  .maximumWeight(conf.mergedIndexCacheSize())
+  .weigher((Weigher) (file, indexInfo) -> 
indexInfo.getSize())
+  .build(indexCacheLoader);
+this.relativeMergeDirPathPattern = relativeMergeDirPathPattern;
+this.errorHandler = new ErrorHandler.BlockPushErrorHandler();
+  }
+
+  /**
+   * Given an ID that uniquely identifies a given shuffle partition of an 
application, retrieves the
+   * associated metadata. If not present and the 

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

2020-10-26 Thread GitBox


AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716784916







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

2020-10-26 Thread GitBox


SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716784902


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34897/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

2020-10-26 Thread GitBox


AmplabJenkins commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716784916







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #30152: Revert "[SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan

2020-10-26 Thread GitBox


viirya commented on pull request #30152:
URL: https://github.com/apache/spark/pull/30152#issuecomment-716782854


   Thanks @dongjoon-hyun 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-26 Thread GitBox


tgravescs commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r512225133



##
File path: 
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##
@@ -172,7 +178,9 @@ protected void serviceInit(Configuration conf) throws 
Exception {
   }
 
   TransportConf transportConf = new TransportConf("shuffle", new 
HadoopConfigProvider(conf));
-  blockHandler = new ExternalBlockHandler(transportConf, 
registeredExecutorFile);
+  shuffleMergeManager = new RemoteBlockPushResolver(transportConf, 
APP_BASE_RELATIVE_PATH);

Review comment:
   I was just thinking it would be easier to create the config now that is 
a class that is either the push based or the Noop one. Then a user could also 
just specify a different class that is their custom one.  I'm fine with doing 
that separate.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-26 Thread GitBox


tgravescs commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r512221375



##
File path: 
common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java
##
@@ -363,4 +363,26 @@ public boolean useOldFetchProtocol() {
 return conf.getBoolean("spark.shuffle.useOldFetchProtocol", false);
   }
 
+  /**
+   * The minimum size of a chunk when dividing a merged shuffle file into 
multiple chunks during
+   * push-based shuffle.
+   * A merged shuffle file consists of multiple small shuffle blocks. Fetching 
the
+   * complete merged shuffle file in a single response increases the memory 
requirements for the
+   * clients. Instead of serving the entire merged file, the shuffle service 
serves the
+   * merged file in `chunks`. A `chunk` constitutes few shuffle blocks in 
entirety and this
+   * configuration controls how big a chunk can get. A corresponding index 
file for each merged
+   * shuffle file will be generated indicating chunk boundaries.
+   */
+  public int minChunkSizeInMergedShuffleFile() {
+return Ints.checkedCast(JavaUtils.byteStringAsBytes(
+  conf.get("spark.shuffle.server.minChunkSizeInMergedShuffleFile", "2m")));

Review comment:
   yeah agree, I think leave on server side.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-26 Thread GitBox


tgravescs commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r512220482



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,899 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  private static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap 
partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final String relativeMergeDirPathPattern;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf, String 
relativeMergeDirPathPattern) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+  // Add `spark` prefix because it will run in NM in Yarn mode.
+  
NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner"));
+this.minChunkSize = conf.minChunkSizeInMergedShuffleFile();
+CacheLoader indexCacheLoader =
+  new CacheLoader() {
+public ShuffleIndexInformation load(File file) throws IOException {
+  return new ShuffleIndexInformation(file);
+}
+  };
+indexCache = CacheBuilder.newBuilder()
+  .maximumWeight(conf.mergedIndexCacheSize())
+  .weigher((Weigher) (file, indexInfo) -> 
indexInfo.getSize())
+  .build(indexCacheLoader);
+this.relativeMergeDirPathPattern = relativeMergeDirPathPattern;
+this.errorHandler = new ErrorHandler.BlockPushErrorHandler();
+  }
+
+  /**
+   * Given an ID that uniquely identifies a given shuffle partition of an 
application, retrieves the
+   * associated metadata. If not present and 

[GitHub] [spark] dongjoon-hyun commented on pull request #30140: [SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan

2020-10-26 Thread GitBox


dongjoon-hyun commented on pull request #30140:
URL: https://github.com/apache/spark/pull/30140#issuecomment-716778229


   @maropu . Could you make a backporting PR to branch-2.4?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #30152: Revert "[SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan

2020-10-26 Thread GitBox


dongjoon-hyun closed pull request #30152:
URL: https://github.com/apache/spark/pull/30152


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30152: Revert "[SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan

2020-10-26 Thread GitBox


dongjoon-hyun commented on pull request #30152:
URL: https://github.com/apache/spark/pull/30152#issuecomment-71680


   Merged to branch-2.4.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30152: Revert "[SPARK-33228][SQL] Don't uncache data when replacing a view having the same logical plan

2020-10-26 Thread GitBox


dongjoon-hyun commented on pull request #30152:
URL: https://github.com/apache/spark/pull/30152#issuecomment-716777364


   In Jenkins, Python tests passed.
   ```
   ...
   Finished test(python3.6): pyspark.ml.tests (187s)
   Tests passed in 951 seconds
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30141: [SPARK-33230][SQL] Hadoop committers to get unique job ID in "spark.sql.sources.writeJobUUID"

2020-10-26 Thread GitBox


dongjoon-hyun commented on pull request #30141:
URL: https://github.com/apache/spark/pull/30141#issuecomment-716775382


   cc @cloud-fan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #30141: [SPARK-33230][SQL] Hadoop committers to get unique job ID in "spark.sql.sources.writeJobUUID"

2020-10-26 Thread GitBox


dongjoon-hyun closed pull request #30141:
URL: https://github.com/apache/spark/pull/30141


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


SparkQA removed a comment on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716768684


   **[Test build #130297 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130297/testReport)**
 for PR 30153 at commit 
[`d48174e`](https://github.com/apache/spark/commit/d48174e21109be167b6fde80b8ee8826eccc000f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


AmplabJenkins removed a comment on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716774481







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


AmplabJenkins commented on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716774481







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


SparkQA commented on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716774302


   **[Test build #130297 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130297/testReport)**
 for PR 30153 at commit 
[`d48174e`](https://github.com/apache/spark/commit/d48174e21109be167b6fde80b8ee8826eccc000f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

2020-10-26 Thread GitBox


SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716770165


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34897/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


SparkQA commented on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716768684


   **[Test build #130297 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130297/testReport)**
 for PR 30153 at commit 
[`d48174e`](https://github.com/apache/spark/commit/d48174e21109be167b6fde80b8ee8826eccc000f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


dongjoon-hyun commented on pull request #30153:
URL: https://github.com/apache/spark/pull/30153#issuecomment-716766241


   Hi, @shaneknapp . I made a PR from Spark side first.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request #30153: [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job

2020-10-26 Thread GitBox


dongjoon-hyun opened a new pull request #30153:
URL: https://github.com/apache/spark/pull/30153


   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2020-10-26 Thread GitBox


AmplabJenkins commented on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-716762323







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2020-10-26 Thread GitBox


AmplabJenkins removed a comment on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-716762323







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2020-10-26 Thread GitBox


SparkQA removed a comment on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-716583128


   **[Test build #130293 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130293/testReport)**
 for PR 30145 at commit 
[`82e901f`](https://github.com/apache/spark/commit/82e901f95316efa31d5d55867bf0152e99d94b70).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2020-10-26 Thread GitBox


SparkQA commented on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-716761264


   **[Test build #130293 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130293/testReport)**
 for PR 30145 at commit 
[`82e901f`](https://github.com/apache/spark/commit/82e901f95316efa31d5d55867bf0152e99d94b70).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30089: [SPARK-33137][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (Postgres dialect)

2020-10-26 Thread GitBox


AmplabJenkins removed a comment on pull request #30089:
URL: https://github.com/apache/spark/pull/30089#issuecomment-716753175







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30089: [SPARK-33137][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (Postgres dialect)

2020-10-26 Thread GitBox


AmplabJenkins commented on pull request #30089:
URL: https://github.com/apache/spark/pull/30089#issuecomment-716753175







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30089: [SPARK-33137][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (Postgres dialect)

2020-10-26 Thread GitBox


SparkQA removed a comment on pull request #30089:
URL: https://github.com/apache/spark/pull/30089#issuecomment-716640452


   **[Test build #130294 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130294/testReport)**
 for PR 30089 at commit 
[`c9f264a`](https://github.com/apache/spark/commit/c9f264a7a5d5e568f74b4147646432f98536df1f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30089: [SPARK-33137][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (Postgres dialect)

2020-10-26 Thread GitBox


SparkQA commented on pull request #30089:
URL: https://github.com/apache/spark/pull/30089#issuecomment-716752086


   **[Test build #130294 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130294/testReport)**
 for PR 30089 at commit 
[`c9f264a`](https://github.com/apache/spark/commit/c9f264a7a5d5e568f74b4147646432f98536df1f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-10-26 Thread GitBox


AmplabJenkins commented on pull request #29414:
URL: https://github.com/apache/spark/pull/29414#issuecomment-716749723







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2020-10-26 Thread GitBox


AmplabJenkins commented on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-716749652







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2020-10-26 Thread GitBox


AmplabJenkins removed a comment on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-716749652







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-10-26 Thread GitBox


AmplabJenkins removed a comment on pull request #29414:
URL: https://github.com/apache/spark/pull/29414#issuecomment-716749723







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2020-10-26 Thread GitBox


SparkQA removed a comment on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-716546948


   **[Test build #130290 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130290/testReport)**
 for PR 30145 at commit 
[`feeae94`](https://github.com/apache/spark/commit/feeae94b1660ad8fd086a9d37ac2719e87db97a0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

2020-10-26 Thread GitBox


SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716748700


   **[Test build #130296 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130296/testReport)**
 for PR 30093 at commit 
[`59f9cd4`](https://github.com/apache/spark/commit/59f9cd4be7f7c93f8cf01c66dfb5619af548e66b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-10-26 Thread GitBox


SparkQA removed a comment on pull request #29414:
URL: https://github.com/apache/spark/pull/29414#issuecomment-716546987


   **[Test build #130291 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130291/testReport)**
 for PR 29414 at commit 
[`d68066e`](https://github.com/apache/spark/commit/d68066eddbfde21c00e33a32763976242371166d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2020-10-26 Thread GitBox


SparkQA commented on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-716748379


   **[Test build #130290 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130290/testReport)**
 for PR 30145 at commit 
[`feeae94`](https://github.com/apache/spark/commit/feeae94b1660ad8fd086a9d37ac2719e87db97a0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-10-26 Thread GitBox


SparkQA commented on pull request #29414:
URL: https://github.com/apache/spark/pull/29414#issuecomment-716748439


   **[Test build #130291 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130291/testReport)**
 for PR 29414 at commit 
[`d68066e`](https://github.com/apache/spark/commit/d68066eddbfde21c00e33a32763976242371166d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-26 Thread GitBox


AmplabJenkins removed a comment on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-716744287







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-26 Thread GitBox


AmplabJenkins commented on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-716744287







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30134: [SPARK-33225][SQL] Extract AliasHelper trait

2020-10-26 Thread GitBox


AmplabJenkins removed a comment on pull request #30134:
URL: https://github.com/apache/spark/pull/30134#issuecomment-716743923







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30134: [SPARK-33225][SQL] Extract AliasHelper trait

2020-10-26 Thread GitBox


AmplabJenkins commented on pull request #30134:
URL: https://github.com/apache/spark/pull/30134#issuecomment-716743923







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-26 Thread GitBox


SparkQA removed a comment on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-716538184


   **[Test build #130289 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130289/testReport)**
 for PR 30097 at commit 
[`cd3329f`](https://github.com/apache/spark/commit/cd3329fadfb7bfcfae97b945d4ac6b8b346d38db).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-26 Thread GitBox


SparkQA commented on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-716743114


   **[Test build #130289 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130289/testReport)**
 for PR 30097 at commit 
[`cd3329f`](https://github.com/apache/spark/commit/cd3329fadfb7bfcfae97b945d4ac6b8b346d38db).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30134: [SPARK-33225][SQL] Extract AliasHelper trait

2020-10-26 Thread GitBox


SparkQA commented on pull request #30134:
URL: https://github.com/apache/spark/pull/30134#issuecomment-716742913


   **[Test build #130292 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130292/testReport)**
 for PR 30134 at commit 
[`bd2e8bc`](https://github.com/apache/spark/commit/bd2e8bcb351c723a6be51d38b81959bdc794e45a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30134: [SPARK-33225][SQL] Extract AliasHelper trait

2020-10-26 Thread GitBox


SparkQA removed a comment on pull request #30134:
URL: https://github.com/apache/spark/pull/30134#issuecomment-716551877


   **[Test build #130292 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130292/testReport)**
 for PR 30134 at commit 
[`bd2e8bc`](https://github.com/apache/spark/commit/bd2e8bcb351c723a6be51d38b81959bdc794e45a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

2020-10-26 Thread GitBox


allisonwang-db commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r512162304



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -1242,6 +1242,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val REMOVE_REDUNDANT_SORTS_ENABLED = 
buildConf("spark.sql.execution.removeRedundantSorts")
+.internal()
+.doc("Whether to remove redundant physical sort node")
+.version("3.1.0")

Review comment:
   I am not exactly sure if it's better to change it in this PR or to 
change it when this PR is backported to 2.4.8 (in case the current change does 
not work in 2.4.8)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

2020-10-26 Thread GitBox


allisonwang-db commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r512162304



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -1242,6 +1242,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val REMOVE_REDUNDANT_SORTS_ENABLED = 
buildConf("spark.sql.execution.removeRedundantSorts")
+.internal()
+.doc("Whether to remove redundant physical sort node")
+.version("3.1.0")

Review comment:
   I am not exactly sure if it's better to change it in this PR or to 
change it when this PR is backported to 2.4.8.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   >