[GitHub] [spark] AmplabJenkins removed a comment on pull request #30412: [SPARK-33480][SQL] Support char/varchar type

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30412:
URL: https://github.com/apache/spark/pull/30412#issuecomment-733202278







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30465: [SPARK-33045][SQL][FOLLOWUP] Support built-in function like_any and fix StackOverflowError issue.

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30465:
URL: https://github.com/apache/spark/pull/30465#issuecomment-733202287







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30440: [SPARK-33496][SQL]Improve error message of ANSI explicit cast

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30440:
URL: https://github.com/apache/spark/pull/30440#issuecomment-733202292







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30412: [SPARK-33480][SQL] Support char/varchar type

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30412:
URL: https://github.com/apache/spark/pull/30412#issuecomment-733202282







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-733202296







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30403:
URL: https://github.com/apache/spark/pull/30403#issuecomment-733202277







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30478: [SPARK-33525][SQL] Update hive-service-rpc to 3.1.2

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30478:
URL: https://github.com/apache/spark/pull/30478#issuecomment-733202279







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xkrogen commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-24 Thread GitBox


xkrogen commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r529838764



##
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##
@@ -2980,6 +2980,75 @@ private[spark] object Utils extends Logging {
 metadata.toString
   }
 
+  /**
+   * Download Ivy URIs dependent jars.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Comma separated string list of URIs of downloaded jars
+   */
+  def resolveMavenDependencies(uri: URI): String = {
+val Seq(repositories, ivyRepoPath, ivySettingsPath) =
+  Seq(
+"spark.jars.repositories",
+"spark.jars.ivy",
+"spark.jars.ivySettings"
+  ).map(sys.props.get(_).orNull)
+// Create the IvySettings, either load from file or build defaults
+val ivySettings = Option(ivySettingsPath) match {
+  case Some(path) =>
+SparkSubmitUtils.loadIvySettings(path, Option(repositories), 
Option(ivyRepoPath))

Review comment:
   Let's not use default Ivy settings... In my experience with some custom 
logic we have, it's very valuable to ensure that all of the Ivy resolution 
obeys the settings.
   
   Maybe we can pull this out into a common utility that can be leveraged here 
and `DriverWrapper`? Then there is no need for testing twice. But I don't think 
saying that the logic is tested elsewhere then copy-pasted is sufficient -- if 
there is drift in the two code paths, we will lose the validation.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xkrogen commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-24 Thread GitBox


xkrogen commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r529837613



##
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##
@@ -2980,6 +2980,75 @@ private[spark] object Utils extends Logging {
 metadata.toString
   }
 
+  /**
+   * Download Ivy URIs dependent jars.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Comma separated string list of URIs of downloaded jars
+   */
+  def resolveMavenDependencies(uri: URI): String = {
+val Seq(repositories, ivyRepoPath, ivySettingsPath) =
+  Seq(
+"spark.jars.repositories",
+"spark.jars.ivy",
+"spark.jars.ivySettings"
+  ).map(sys.props.get(_).orNull)
+// Create the IvySettings, either load from file or build defaults
+val ivySettings = Option(ivySettingsPath) match {
+  case Some(path) =>
+SparkSubmitUtils.loadIvySettings(path, Option(repositories), 
Option(ivyRepoPath))

Review comment:
   Let's not use default Ivy settings... In my experience with some custom 
logic we have, it's very valuable to ensure that all of the Ivy resolution 
obeys the settings.
   
   Maybe we can pull this out into a common utility that can be leveraged here 
and `DriverWrapper`? Then there is no need for testing twice. But I don't think 
saying that the logic is tested elsewhere then copy-pasted is sufficient -- if 
there is drift in the two code paths, we will lose the validation.

##
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##
@@ -1348,6 +1348,7 @@ private[spark] object SparkSubmitUtils {
   coordinates: String,
   ivySettings: IvySettings,
   exclusions: Seq[String] = Nil,
+  transitive: Boolean = true,

Review comment:
   Nit: Add Scaladoc for this new parameter

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala
##
@@ -159,6 +161,13 @@ class SessionResourceLoader(session: SparkSession) extends 
FunctionResourceLoade
 }
   }
 
+  protected def resolveJars(path: String): List[String] = {
+new Path(path).toUri.getScheme match {

Review comment:
   Shouldn't we use `URI.create(path)` a single time, then re-use the URI 
in this line and the one below? I also think I remember you mentioning 
elsewhere that `new Path(path).toUri` can lose information.

##
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##
@@ -2980,6 +2980,77 @@ private[spark] object Utils extends Logging {
 metadata.toString
   }
 
+  /**
+   * Download Ivy URIs dependent jars.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Comma separated string list of URIs of downloaded jars
+   */
+  def resolveMavenDependencies(uri: URI): String = {
+val Seq(repositories, ivyRepoPath, ivySettingsPath) =
+  Seq(
+"spark.jars.repositories",
+"spark.jars.ivy",
+"spark.jars.ivySettings"
+  ).map(sys.props.get(_).orNull)
+// Create the IvySettings, either load from file or build defaults
+val ivySettings = Option(ivySettingsPath) match {
+  case Some(path) =>
+SparkSubmitUtils.loadIvySettings(path, Option(repositories), 
Option(ivyRepoPath))
+
+  case None =>
+SparkSubmitUtils.buildIvySettings(Option(repositories), 
Option(ivyRepoPath))
+}
+SparkSubmitUtils.resolveMavenCoordinates(uri.getAuthority, ivySettings,
+  parseExcludeList(uri.getQuery), parseTransitive(uri.getQuery))
+  }
+
+  private def parseURLQueryParameter(queryString: String, queryTag: String): 
Array[String] = {
+if (queryString == null || queryString.isEmpty) {
+  Array.empty[String]
+} else {
+  val mapTokens = queryString.split("&")
+  assert(mapTokens.forall(_.split("=").length == 2), "Invalid query 
string: " + queryString)

Review comment:
   IIRC this will accept URLs that looks like `?=foo`, `?foo=`, or 
`?bar=&baz=foo`. It would be good to add tests for this to confirm, and adjust 
as necessary.
   
   Same comment for `parseExcludeList` below.

##
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##
@@ -2980,6 +2980,77 @@ private[spark] object Utils extends Logging {
 metadata.toString
   }
 
+  /**
+   * Download Ivy URIs dependent jars.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Comma separated string list of URIs of downloaded jars

Review comment:
   Should we be returning a `List[String]` instead of `String` (here and in 
`SparkSubmitUtils`)? It seems odd to have `SparkSubmitUtils` do a `mkString` to 
convert a list to string, then re-convert back to a list later.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please 

[GitHub] [spark] otterc commented on a change in pull request #30433: [SPARK-32916][SHUFFLE][test-maven][test-hadoop2.7] Ensure the number of chunks in meta file and index file are equal

2020-11-24 Thread GitBox


otterc commented on a change in pull request #30433:
URL: https://github.com/apache/spark/pull/30433#discussion_r529837473



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -827,13 +833,16 @@ void resetChunkTracker() {
 void updateChunkInfo(long chunkOffset, int mapIndex) throws IOException {
   long idxStartPos = -1;
   try {
-// update the chunk tracker to meta file before index file
-writeChunkTracker(mapIndex);
 idxStartPos = indexFile.getFilePointer();
 logger.trace("{} shuffleId {} reduceId {} updated index current {} 
updated {}",
   appShuffleId.appId, appShuffleId.shuffleId, reduceId, 
this.lastChunkOffset,
   chunkOffset);
-indexFile.writeLong(chunkOffset);
+indexFile.write(Longs.toByteArray(chunkOffset));
+// Chunk bitmap should be written to the meta file after the index 
file because if there are
+// any exceptions during writing the offset to the index file, meta 
file should not be
+// updated. If the update to the index file is successful but the 
update to meta file isn't
+// then the index file position is reset in the catch clause.
+writeChunkTracker(mapIndex);

Review comment:
   > Besides, I'm thinking it would be good if we could dynamically change 
the merger location when such IO exception happens.
   
   Which one of these do you mean?
   1.  The executor should start pushing to a different server for the current 
shuffle that is going on.
   2.  Do you mean that for a future shuffle, the driver should decide the 
merger locations considering these IOExceptions. 
   
   (1) is currently difficult. The driver decides the specific merger locations 
before a shuffle stage and all the executors work on the assumption that these 
are all the shuffle servers that they push the data to. This list of shuffle 
servers should be consistent across the shuffle servers because if it's not 
then they will be push the shuffle data belonging to a reducer to different 
shuffle servers. This is the code I am talking about- 
[code](https://github.com/linkedin/spark/blob/002cb69e8ddf49bbe114744b84c46e0fd452d852/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala#L360).
   
   (2) This could be a future improvement where we also include the metrics for 
merge reported back to driver when the driver triggers the finalize. The driver 
could use these metrics when deciding which shuffle server to use for the next 
push.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #30490: [SPARK-33543][SQL] Migrate SHOW COLUMNS command to use UnresolvedTableOrView to resolve the identifier

2020-11-24 Thread GitBox


imback82 commented on a change in pull request #30490:
URL: https://github.com/apache/spark/pull/30490#discussion_r529836248



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##
@@ -467,25 +467,13 @@ class ResolveSessionCatalog(
 v1TableName.asTableIdentifier,
 partitionSpec)
 
-case ShowColumnsStatement(tbl, ns) =>
-  if (ns.isDefined && ns.get.length > 1) {
-throw new AnalysisException(
-  s"Namespace name should have only one part if specified: 
${ns.get.quoted}")
-  }

Review comment:
   Note that this is not hit anymore now that namespace is always added to 
the table name in `AstBuilder` so that v2 tables can be resolved.
   
   If there are multi parts in the namespace name, it will fail while looking 
up in the session catalog: `The namespace in session catalog must have exactly 
one name part`. I added a test for this below.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 opened a new pull request #30490: [SPARK-33543][SQL] Migrate SHOW COLUMNS command to use UnresolvedTableOrView to resolve the identifier

2020-11-24 Thread GitBox


imback82 opened a new pull request #30490:
URL: https://github.com/apache/spark/pull/30490


   
   
   ### What changes were proposed in this pull request?
   
   This PR proposes to migrate `SHOW COLUMNS` to use `UnresolvedTableOrView` to 
resolve the table/view identifier. This allows consistent resolution rules 
(temp view first, etc.) to be applied for both v1/v2 commands. More info about 
the consistent resolution rule proposal can be found in 
[JIRA](https://issues.apache.org/jira/browse/SPARK-29900) or [proposal 
doc](https://docs.google.com/document/d/1hvLjGA8y_W_hhilpngXVub1Ebv8RsMap986nENCFnrg/edit?usp=sharing).
   
   Note that `SHOW COLUMNS` is not yet supported for v2 tables.
   
   ### Why are the changes needed?
   
   To use `UnresolvedTableOrView` for table/view resolution. Note that 
`ShowColumnsCommand` internally resolves to a temp view first, so there is no 
resolution behavior change with this PR.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Updated existing tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #30312: [SPARK-32917][SHUFFLE][CORE] Adds support for executors to push shuffle blocks after successful map task completion

2020-11-24 Thread GitBox


otterc commented on a change in pull request #30312:
URL: https://github.com/apache/spark/pull/30312#discussion_r529831878



##
File path: core/src/main/scala/org/apache/spark/shuffle/ShuffleWriter.scala
##
@@ -17,18 +17,466 @@
 
 package org.apache.spark.shuffle
 
-import java.io.IOException
+import java.io.{File, IOException}
+import java.net.ConnectException
+import java.nio.ByteBuffer
+import java.util.concurrent.ExecutorService
 
+import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet, Queue}
+
+import com.google.common.base.Throwables
+
+import org.apache.spark.{ShuffleDependency, SparkConf, SparkEnv}
+import org.apache.spark.annotation.Since
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.network.buffer.{FileSegmentManagedBuffer, 
ManagedBuffer, NioManagedBuffer}
+import org.apache.spark.network.netty.SparkTransportConf
+import org.apache.spark.network.shuffle.{BlockFetchingListener, 
BlockStoreClient}
+import org.apache.spark.network.shuffle.ErrorHandler.BlockPushErrorHandler
+import org.apache.spark.network.util.TransportConf
 import org.apache.spark.scheduler.MapStatus
+import org.apache.spark.shuffle.ShuffleWriter._
+import org.apache.spark.storage.{BlockId, BlockManagerId, ShufflePushBlockId}
+import org.apache.spark.util.{ThreadUtils, Utils}
 
 /**
- * Obtained inside a map task to write out records to the shuffle system.
+ * Obtained inside a map task to write out records to the shuffle system, and 
optionally
+ * initiate the block push process to remote shuffle services if push based 
shuffle is enabled.
  */
-private[spark] abstract class ShuffleWriter[K, V] {
+private[spark] abstract class ShuffleWriter[K, V] extends Logging {
+  private[this] var maxBytesInFlight = 0L
+  private[this] var maxReqsInFlight = 0
+  private[this] var maxBlocksInFlightPerAddress = 0
+  private[this] var bytesInFlight = 0L
+  private[this] var reqsInFlight = 0
+  private[this] val numBlocksInFlightPerAddress = new HashMap[BlockManagerId, 
Int]()
+  private[this] val deferredPushRequests = new HashMap[BlockManagerId, 
Queue[PushRequest]]()
+  private[this] val pushRequests = new Queue[PushRequest]
+  private[this] val errorHandler = createErrorHandler()
+  private[this] val unreachableBlockMgrs = new HashSet[BlockManagerId]()
+
   /** Write a sequence of records to this task's output */
   @throws[IOException]
   def write(records: Iterator[Product2[K, V]]): Unit
 
   /** Close this writer, passing along whether the map completed */
   def stop(success: Boolean): Option[MapStatus]
+
+  def getPartitionLengths(): Array[Long]
+
+  // VisibleForTesting
+  private[shuffle] def createErrorHandler(): BlockPushErrorHandler = {
+new BlockPushErrorHandler() {
+  // For a connection exception against a particular host, we will stop 
pushing any
+  // blocks to just that host and continue push blocks to other hosts. So, 
here push of
+  // all blocks will only stop when it is "Too Late". Also see 
updateStateAndCheckIfPushMore.
+  override def shouldRetryError(t: Throwable): Boolean = {
+// If the block is too late, there is no need to retry it
+
!Throwables.getStackTraceAsString(t).contains(BlockPushErrorHandler.TOO_LATE_MESSAGE_SUFFIX)
+  }
+}
+  }
+
+  /**
+   * Initiate the block push process. This will be invoked after the shuffle 
writer
+   * finishes writing the shuffle file if push based shuffle is enabled.
+   *
+   * @param resolver block resolver used to locate mapper generated 
shuffle file
+   * @param partitionLengths array of shuffle block size so we can tell 
shuffle block
+   * boundaries within the shuffle file
+   * @param dep  shuffle dependency to get shuffle ID and the 
location of remote shuffle
+   * services to push local shuffle blocks
+   * @param partitionId  map index of the shuffle map task
+   * @param mapIdmapId of the shuffle map task
+   * @param conf spark configuration
+   */
+  def initiateBlockPush(
+  resolver: IndexShuffleBlockResolver,
+  partitionLengths: Array[Long],
+  dep: ShuffleDependency[_, _, _],
+  partitionId: Int,
+  mapId: Long,
+  conf: SparkConf): Unit = {
+val numPartitions = dep.partitioner.numPartitions
+val dataFile = resolver.getDataFile(dep.shuffleId, mapId)
+val transportConf = SparkTransportConf.fromSparkConf(conf, "shuffle")
+
+val maxBlockSizeToPush = conf.get(PUSH_SHUFFLE_MAX_BLOCK_SIZE_TO_PUSH) * 
1024
+val maxBlockBatchSize = conf.get(PUSH_SHUFFLE_MAX_BLOCK_BATCH_SIZE) * 1024 
* 1024
+val mergerLocs = dep.getMergerLocs.map(loc =>
+  BlockManagerId("", loc.host, loc.port))
+
+maxBytesInFlight = conf.getSizeAsMb("spark.reducer.maxSizeInFlight", 
"48m") * 1024 * 1024
+maxReqsInFlight = conf.getInt("spark.reducer.maxReqsInFlight

[GitHub] [spark] otterc commented on a change in pull request #30312: [SPARK-32917][SHUFFLE][CORE] Adds support for executors to push shuffle blocks after successful map task completion

2020-11-24 Thread GitBox


otterc commented on a change in pull request #30312:
URL: https://github.com/apache/spark/pull/30312#discussion_r529831293



##
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##
@@ -1992,4 +1992,32 @@ package object config {
   .version("3.1.0")
   .doubleConf
   .createWithDefault(5)
+
+  private[spark] val SHUFFLE_NUM_PUSH_THREADS =
+ConfigBuilder("spark.shuffle.push.numPushThreads")
+  .doc("Specify the number of threads in the block pusher pool. These 
threads assist " +
+"in creating connections and pushing blocks to remote shuffle services 
when push based " +
+"shuffle is enabled. By default, the threadpool size is equal to the 
number of cores.")
+  .version("3.1.0")
+  .intConf
+  .createOptional
+
+  private[spark] val SHUFFLE_MAX_BLOCK_SIZE_TO_PUSH =
+ConfigBuilder("spark.shuffle.push.maxBlockSizeToPush")
+  .doc("The max size of an individual block to push to the remote shuffle 
services when push " +
+"based shuffle is enabled. Blocks larger than this threshold are not 
pushed.")
+  .version("3.1.0")
+  .bytesConf(ByteUnit.KiB)
+  .createWithDefaultString("800k")
+
+  private[spark] val SHUFFLE_MAX_BLOCK_BATCH_SIZE_FOR_PUSH =
+ConfigBuilder("spark.shuffle.push.maxBlockBatchSize")
+  .doc("The max size of a batch of shuffle blocks to be grouped into a 
single push request " +
+"when push based shuffle is enabled.")

Review comment:
   done

##
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##
@@ -1992,4 +1992,32 @@ package object config {
   .version("3.1.0")
   .doubleConf
   .createWithDefault(5)
+
+  private[spark] val SHUFFLE_NUM_PUSH_THREADS =
+ConfigBuilder("spark.shuffle.push.numPushThreads")
+  .doc("Specify the number of threads in the block pusher pool. These 
threads assist " +
+"in creating connections and pushing blocks to remote shuffle services 
when push based " +
+"shuffle is enabled. By default, the threadpool size is equal to the 
number of cores.")
+  .version("3.1.0")
+  .intConf
+  .createOptional
+
+  private[spark] val SHUFFLE_MAX_BLOCK_SIZE_TO_PUSH =
+ConfigBuilder("spark.shuffle.push.maxBlockSizeToPush")
+  .doc("The max size of an individual block to push to the remote shuffle 
services when push " +
+"based shuffle is enabled. Blocks larger than this threshold are not 
pushed.")

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #30312: [SPARK-32917][SHUFFLE][CORE] Adds support for executors to push shuffle blocks after successful map task completion

2020-11-24 Thread GitBox


otterc commented on a change in pull request #30312:
URL: https://github.com/apache/spark/pull/30312#discussion_r529830847



##
File path: 
core/src/main/scala/org/apache/spark/shuffle/PushShuffleWriterComponent.scala
##
@@ -0,0 +1,464 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.shuffle
+
+import java.io.File
+import java.net.ConnectException
+import java.nio.ByteBuffer
+import java.util.concurrent.ExecutorService
+
+import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet, Queue}
+
+import com.google.common.base.Throwables
+
+import org.apache.spark.{ShuffleDependency, SparkConf, SparkEnv}
+import org.apache.spark.annotation.Since
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.network.buffer.{FileSegmentManagedBuffer, 
ManagedBuffer, NioManagedBuffer}
+import org.apache.spark.network.netty.SparkTransportConf
+import org.apache.spark.network.shuffle.BlockFetchingListener
+import org.apache.spark.network.shuffle.ErrorHandler.BlockPushErrorHandler
+import org.apache.spark.network.util.TransportConf
+import org.apache.spark.shuffle.PushShuffleWriterComponent._
+import org.apache.spark.storage.{BlockId, BlockManagerId, ShufflePushBlockId}
+import org.apache.spark.util.{ThreadUtils, Utils}
+
+/**
+ * Used for pushing shuffle blocks to remote shuffle services when push 
shuffle is enabled.
+ * When push shuffle is enabled, it is created after the shuffle writer 
finishes writing the shuffle
+ * file and initiates the block push process.
+ *
+ * @param dataFile mapper generated shuffle data file
+ * @param partitionLengths array of shuffle block size so we can tell shuffle 
block
+ * boundaries within the shuffle file
+ * @param dep  shuffle dependency to get shuffle ID and the 
location of remote shuffle
+ * services to push local shuffle blocks
+ * @param partitionId  map index of the shuffle map task
+ * @param conf spark configuration
+ */
+private[spark] class PushShuffleWriterComponent(

Review comment:
   I have renamed it to `ShuffleBlockPusher`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #30312: [SPARK-32917][SHUFFLE][CORE] Adds support for executors to push shuffle blocks after successful map task completion

2020-11-24 Thread GitBox


otterc commented on a change in pull request #30312:
URL: https://github.com/apache/spark/pull/30312#discussion_r529830296



##
File path: 
core/src/main/scala/org/apache/spark/shuffle/PushShuffleWriterComponent.scala
##
@@ -0,0 +1,464 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.shuffle
+
+import java.io.File
+import java.net.ConnectException
+import java.nio.ByteBuffer
+import java.util.concurrent.ExecutorService
+
+import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet, Queue}
+
+import com.google.common.base.Throwables
+
+import org.apache.spark.{ShuffleDependency, SparkConf, SparkEnv}
+import org.apache.spark.annotation.Since
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.network.buffer.{FileSegmentManagedBuffer, 
ManagedBuffer, NioManagedBuffer}
+import org.apache.spark.network.netty.SparkTransportConf
+import org.apache.spark.network.shuffle.BlockFetchingListener
+import org.apache.spark.network.shuffle.ErrorHandler.BlockPushErrorHandler
+import org.apache.spark.network.util.TransportConf
+import org.apache.spark.shuffle.PushShuffleWriterComponent._
+import org.apache.spark.storage.{BlockId, BlockManagerId, ShufflePushBlockId}
+import org.apache.spark.util.{ThreadUtils, Utils}
+
+/**
+ * Used for pushing shuffle blocks to remote shuffle services when push 
shuffle is enabled.
+ * When push shuffle is enabled, it is created after the shuffle writer 
finishes writing the shuffle
+ * file and initiates the block push process.
+ *
+ * @param dataFile mapper generated shuffle data file
+ * @param partitionLengths array of shuffle block size so we can tell shuffle 
block
+ * boundaries within the shuffle file
+ * @param dep  shuffle dependency to get shuffle ID and the 
location of remote shuffle
+ * services to push local shuffle blocks
+ * @param partitionId  map index of the shuffle map task
+ * @param conf spark configuration
+ */
+private[spark] class PushShuffleWriterComponent(
+dataFile: File,
+partitionLengths: Array[Long],
+dep: ShuffleDependency[_, _, _],
+partitionId: Int,
+conf: SparkConf) extends Logging {
+  private[this] var maxBytesInFlight = 0L
+  private[this] var maxReqsInFlight = 0
+  private[this] var maxBlocksInFlightPerAddress = 0
+  private[this] var bytesInFlight = 0L
+  private[this] var reqsInFlight = 0
+  private[this] val numBlocksInFlightPerAddress = new HashMap[BlockManagerId, 
Int]()
+  private[this] val deferredPushRequests = new HashMap[BlockManagerId, 
Queue[PushRequest]]()
+  private[this] val pushRequests = new Queue[PushRequest]
+  private[this] val errorHandler = createErrorHandler()
+  private[this] val unreachableBlockMgrs = new HashSet[BlockManagerId]()
+
+  // VisibleForTesting
+  private[shuffle] def createErrorHandler(): BlockPushErrorHandler = {
+new BlockPushErrorHandler() {
+  // For a connection exception against a particular host, we will stop 
pushing any
+  // blocks to just that host and continue push blocks to other hosts. So, 
here push of
+  // all blocks will only stop when it is "Too Late". Also see 
updateStateAndCheckIfPushMore.
+  override def shouldRetryError(t: Throwable): Boolean = {
+// If the block is too late, there is no need to retry it
+
!Throwables.getStackTraceAsString(t).contains(BlockPushErrorHandler.TOO_LATE_MESSAGE_SUFFIX)
+  }
+}
+  }
+
+  private[shuffle] def initiateBlockPush(): Unit = {
+val numPartitions = dep.partitioner.numPartitions
+val transportConf = SparkTransportConf.fromSparkConf(conf, "shuffle")
+
+val maxBlockSizeToPush = conf.get(SHUFFLE_MAX_BLOCK_SIZE_TO_PUSH) * 1024
+val maxBlockBatchSize = conf.get(SHUFFLE_MAX_BLOCK_BATCH_SIZE_FOR_PUSH) * 
1024 * 1024
+val mergerLocs = dep.getMergerLocs.map(loc =>
+  BlockManagerId("", loc.host, loc.port))
+
+maxBytesInFlight = conf.getSizeAsMb("spark.reducer.maxSizeInFlight", 
"48m") * 1024 * 1024
+maxReqsInFlight = conf.getInt("spark.reducer.maxReqsInFlight", 
Int.MaxValue)
+ 

[GitHub] [spark] SparkQA removed a comment on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-733113801


   **[Test build #131682 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131682/testReport)**
 for PR 29000 at commit 
[`85aa12a`](https://github.com/apache/spark/commit/85aa12a618ceadfe510a4f9fc3718a746a1bc357).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30478: [SPARK-33525][SQL] Update hive-service-rpc to 3.1.2

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30478:
URL: https://github.com/apache/spark/pull/30478#issuecomment-733112768


   **[Test build #131679 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131679/testReport)**
 for PR 30478 at commit 
[`43d90ca`](https://github.com/apache/spark/commit/43d90cafaf0aa4c8c4a355070d5c71008f6f3ea9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #30312: [SPARK-32917][SHUFFLE][CORE] Adds support for executors to push shuffle blocks after successful map task completion

2020-11-24 Thread GitBox


otterc commented on a change in pull request #30312:
URL: https://github.com/apache/spark/pull/30312#discussion_r529830075



##
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##
@@ -1992,4 +1992,32 @@ package object config {
   .version("3.1.0")
   .doubleConf
   .createWithDefault(5)
+
+  private[spark] val SHUFFLE_NUM_PUSH_THREADS =
+ConfigBuilder("spark.shuffle.push.numPushThreads")
+  .doc("Specify the number of threads in the block pusher pool. These 
threads assist " +
+"in creating connections and pushing blocks to remote shuffle services 
when push based " +
+"shuffle is enabled. By default, the threadpool size is equal to the 
number of cores.")
+  .version("3.1.0")
+  .intConf
+  .createOptional
+
+  private[spark] val SHUFFLE_MAX_BLOCK_SIZE_TO_PUSH =
+ConfigBuilder("spark.shuffle.push.maxBlockSizeToPush")
+  .doc("The max size of an individual block to push to the remote shuffle 
services when push " +
+"based shuffle is enabled. Blocks larger than this threshold are not 
pushed.")
+  .version("3.1.0")
+  .bytesConf(ByteUnit.KiB)
+  .createWithDefaultString("800k")

Review comment:
   @Victsm could you please address this one?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30478: [SPARK-33525][SQL] Update hive-service-rpc to 3.1.2

2020-11-24 Thread GitBox


SparkQA commented on pull request #30478:
URL: https://github.com/apache/spark/pull/30478#issuecomment-733189691


   **[Test build #131679 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131679/testReport)**
 for PR 30478 at commit 
[`43d90ca`](https://github.com/apache/spark/commit/43d90cafaf0aa4c8c4a355070d5c71008f6f3ea9).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #30312: [SPARK-32917][SHUFFLE][CORE] Adds support for executors to push shuffle blocks after successful map task completion

2020-11-24 Thread GitBox


otterc commented on a change in pull request #30312:
URL: https://github.com/apache/spark/pull/30312#discussion_r529829788



##
File path: 
core/src/main/scala/org/apache/spark/shuffle/PushShuffleWriterComponent.scala
##
@@ -0,0 +1,464 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.shuffle
+
+import java.io.File
+import java.net.ConnectException
+import java.nio.ByteBuffer
+import java.util.concurrent.ExecutorService
+
+import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet, Queue}
+
+import com.google.common.base.Throwables
+
+import org.apache.spark.{ShuffleDependency, SparkConf, SparkEnv}
+import org.apache.spark.annotation.Since
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.network.buffer.{FileSegmentManagedBuffer, 
ManagedBuffer, NioManagedBuffer}
+import org.apache.spark.network.netty.SparkTransportConf
+import org.apache.spark.network.shuffle.BlockFetchingListener
+import org.apache.spark.network.shuffle.ErrorHandler.BlockPushErrorHandler
+import org.apache.spark.network.util.TransportConf
+import org.apache.spark.shuffle.PushShuffleWriterComponent._
+import org.apache.spark.storage.{BlockId, BlockManagerId, ShufflePushBlockId}
+import org.apache.spark.util.{ThreadUtils, Utils}
+
+/**
+ * Used for pushing shuffle blocks to remote shuffle services when push 
shuffle is enabled.
+ * When push shuffle is enabled, it is created after the shuffle writer 
finishes writing the shuffle
+ * file and initiates the block push process.
+ *
+ * @param dataFile mapper generated shuffle data file
+ * @param partitionLengths array of shuffle block size so we can tell shuffle 
block
+ * boundaries within the shuffle file
+ * @param dep  shuffle dependency to get shuffle ID and the 
location of remote shuffle
+ * services to push local shuffle blocks
+ * @param partitionId  map index of the shuffle map task
+ * @param conf spark configuration
+ */
+private[spark] class PushShuffleWriterComponent(
+dataFile: File,
+partitionLengths: Array[Long],
+dep: ShuffleDependency[_, _, _],
+partitionId: Int,
+conf: SparkConf) extends Logging {
+  private[this] var maxBytesInFlight = 0L
+  private[this] var maxReqsInFlight = 0
+  private[this] var maxBlocksInFlightPerAddress = 0
+  private[this] var bytesInFlight = 0L
+  private[this] var reqsInFlight = 0
+  private[this] val numBlocksInFlightPerAddress = new HashMap[BlockManagerId, 
Int]()
+  private[this] val deferredPushRequests = new HashMap[BlockManagerId, 
Queue[PushRequest]]()
+  private[this] val pushRequests = new Queue[PushRequest]
+  private[this] val errorHandler = createErrorHandler()
+  private[this] val unreachableBlockMgrs = new HashSet[BlockManagerId]()
+
+  // VisibleForTesting
+  private[shuffle] def createErrorHandler(): BlockPushErrorHandler = {
+new BlockPushErrorHandler() {
+  // For a connection exception against a particular host, we will stop 
pushing any
+  // blocks to just that host and continue push blocks to other hosts. So, 
here push of
+  // all blocks will only stop when it is "Too Late". Also see 
updateStateAndCheckIfPushMore.
+  override def shouldRetryError(t: Throwable): Boolean = {
+// If the block is too late, there is no need to retry it
+
!Throwables.getStackTraceAsString(t).contains(BlockPushErrorHandler.TOO_LATE_MESSAGE_SUFFIX)
+  }
+}
+  }
+
+  private[shuffle] def initiateBlockPush(): Unit = {
+val numPartitions = dep.partitioner.numPartitions
+val transportConf = SparkTransportConf.fromSparkConf(conf, "shuffle")
+
+val maxBlockSizeToPush = conf.get(SHUFFLE_MAX_BLOCK_SIZE_TO_PUSH) * 1024
+val maxBlockBatchSize = conf.get(SHUFFLE_MAX_BLOCK_BATCH_SIZE_FOR_PUSH) * 
1024 * 1024
+val mergerLocs = dep.getMergerLocs.map(loc =>
+  BlockManagerId("", loc.host, loc.port))

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, 

[GitHub] [spark] SparkQA commented on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-11-24 Thread GitBox


SparkQA commented on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-733189434


   **[Test build #131682 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131682/testReport)**
 for PR 29000 at commit 
[`85aa12a`](https://github.com/apache/spark/commit/85aa12a618ceadfe510a4f9fc3718a746a1bc357).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zero323 commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

2020-11-24 Thread GitBox


zero323 commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r529829185



##
File path: python/pyspark/mllib/util.py
##
@@ -273,24 +317,30 @@ def convertVectorColumnsFromML(dataset, *cols):
 return callMLlibFunc("convertVectorColumnsFromML", dataset, list(cols))
 
 @staticmethod
-@since("2.0.0")
 def convertMatrixColumnsToML(dataset, *cols):
 """
 Converts matrix columns in an input DataFrame from the
 :py:class:`pyspark.mllib.linalg.Matrix` type to the new
 :py:class:`pyspark.ml.linalg.Matrix` type under the `spark.ml`
 package.
 
-:param dataset:
-  input dataset
-:param cols:
-  a list of matrix columns to be converted.
-  New matrix columns will be ignored. If unspecified, all old
-  matrix columns will be converted excepted nested ones.
-:return:
-  the input dataset with old matrix columns converted to the
-  new matrix type
+.. versionadded:: 2.0.0
 
+dataset : :py:class:`pyspark.sql.DataFrame`
+input dataset
+cols : str

Review comment:
   That's a good question. Looking at 
[`numpy.ndarray.item`](https://numpy.org/devdocs/reference/generated/numpy.ndarray.item.html)
 we should make varargs explicit
   
   *cols : ...
   
   As of type I was thinking `str` as we support variable number of `str` 
values, but looking at the code, a single `List[str]` would do as well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] rf972 commented on a change in pull request #29695: [SPARK-22390][SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down

2020-11-24 Thread GitBox


rf972 commented on a change in pull request #29695:
URL: https://github.com/apache/spark/pull/29695#discussion_r529825209



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##
@@ -73,33 +77,25 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] {
   postScanFilters)
 Aggregate(groupingExpressions, resultExpressions, plan)
   } else {
-val resultAttributes = resultExpressions.map(_.toAttribute)
-  .map ( e => e match { case a: AttributeReference => a })
-var index = 0
 val aggOutputBuilder = ArrayBuilder.make[AttributeReference]
-for (a <- resultAttributes) {
-  val newName = if (a.name.contains("FILTER")) {
-a.name.substring(0, a.name.indexOf("FILTER") - 1)
-  } else if (a.name.contains("DISTINCT")) {
-a.name.replace("DISTINCT ", "")
-  } else {
-a.name
-  }
-
-  aggOutputBuilder +=
-a.copy(name = newName,
-  dataType = aggregates(index).dataType)(exprId = 
NamedExpression.newExprId,
-  qualifier = a.qualifier)
-  index += 1
+for (a <- aggregates) {
+aggOutputBuilder += AttributeReference(toPrettySQL(a), 
a.dataType)()
 }
 val aggOutput = aggOutputBuilder.result
 
-var newOutput = aggOutput
-for (col <- output) {
-  if (!aggOutput.exists(_.name.contains(col.name))) {
-newOutput = col +: newOutput
+val newOutputBuilder = ArrayBuilder.make[AttributeReference]
+for (col1 <- output) {
+  var found = false
+  for (col2 <- aggOutput) {
+  if (contains(col2.name, col1.name)) {

Review comment:
   Thanks @huaxingao  for incorporating our suggestions into the patch 
regarding aggregates containing expressions!  
   With your changes, TPCH Q6 test shows a substantial reduction in data 
transfer.  
   With just filter and projection pushdown, Q6 transfer size is about 1.5 MB.  
But with aggregate pushdown this now gets reduced to 17 bytes.
   
   In our testing we found a case that throws an error.  Grouping by more than 
one column seems to cause an error.  Here is an example that fails for us:
   
   ```
   val df = sparkSession.sql("select BONUS, SUM(SALARY+BONUS), SALARY FROM 
h2.test.employee" + 
  " GROUP BY SALARY, BONUS")
   df.show()
   ```
   
   If you have any thoughts or suggestions on a solution for this, we would 
appreciate it.  Thanks !
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables

2020-11-24 Thread GitBox


viirya commented on pull request #30403:
URL: https://github.com/apache/spark/pull/30403#issuecomment-733185335


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zero323 commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

2020-11-24 Thread GitBox


zero323 commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r529824599



##
File path: python/pyspark/mllib/stat/_statistics.py
##
@@ -65,11 +65,19 @@ def colStats(rdd):
 """
 Computes column-wise summary statistics for the input RDD[Vector].
 
-:param rdd: an RDD[Vector] for which column-wise summary statistics
-are to be computed.
-:return: :class:`MultivariateStatisticalSummary` object containing
- column-wise summary statistics.
-
+Parameters
+--
+rdd : :py:class:`pyspark.RDD`
+an RDD[Vector] for which column-wise summary statistics

Review comment:
   Conveniently, that's syntax we use both for Scala and Python, with 
corresponding type hints looking like this:
   
   
https://github.com/apache/spark/blob/048a9821c788b6796d52d1e2a0cd174377ebd0f0/python/pyspark/mllib/stat/_statistics.pyi#L44





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #30482: [SPARK-33529][SQL] Handle '__HIVE_DEFAULT_PARTITION__' while resolving V2 partition specs

2020-11-24 Thread GitBox


MaxGekk commented on pull request #30482:
URL: https://github.com/apache/spark/pull/30482#issuecomment-733184440


   jenkins, retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #30433: [SPARK-32916][SHUFFLE][test-maven][test-hadoop2.7] Ensure the number of chunks in meta file and index file are equal

2020-11-24 Thread GitBox


otterc commented on a change in pull request #30433:
URL: https://github.com/apache/spark/pull/30433#discussion_r529823126



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -827,13 +833,16 @@ void resetChunkTracker() {
 void updateChunkInfo(long chunkOffset, int mapIndex) throws IOException {
   long idxStartPos = -1;
   try {
-// update the chunk tracker to meta file before index file
-writeChunkTracker(mapIndex);
 idxStartPos = indexFile.getFilePointer();
 logger.trace("{} shuffleId {} reduceId {} updated index current {} 
updated {}",
   appShuffleId.appId, appShuffleId.shuffleId, reduceId, 
this.lastChunkOffset,
   chunkOffset);
-indexFile.writeLong(chunkOffset);
+indexFile.write(Longs.toByteArray(chunkOffset));
+// Chunk bitmap should be written to the meta file after the index 
file because if there are
+// any exceptions during writing the offset to the index file, meta 
file should not be
+// updated. If the update to the index file is successful but the 
update to meta file isn't
+// then the index file position is reset in the catch clause.
+writeChunkTracker(mapIndex);

Review comment:
   Had a discussion with @Victsm @mridulm yesterday. This is the approach 
currently we are thinking of:
   1. Verify if `seek` is not recoverable. If it is not then let the clients 
know to stop pushing and stop merging this partition.
   2. Have a threshold on number of `IOExceptions` from writes and when this 
threshold is reached for a single partition, inform the client to stop pushing 
and stop merging this partition.
   3. When  the update to metadata fails, not propagate this exception back to 
client so that they push the block again. The size of the current chunk may 
grow but with (2) in place it will still be of a manageable size.
   
   These changes will not be that complex.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30403:
URL: https://github.com/apache/spark/pull/30403#issuecomment-733112980


   **[Test build #131681 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131681/testReport)**
 for PR 30403 at commit 
[`f22159c`](https://github.com/apache/spark/commit/f22159c1e5c4ea6f6d2838da77d43f6be18a00a5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables

2020-11-24 Thread GitBox


SparkQA commented on pull request #30403:
URL: https://github.com/apache/spark/pull/30403#issuecomment-733183154


   **[Test build #131681 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131681/testReport)**
 for PR 30403 at commit 
[`f22159c`](https://github.com/apache/spark/commit/f22159c1e5c4ea6f6d2838da77d43f6be18a00a5).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class TruncateTable(`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30421:
URL: https://github.com/apache/spark/pull/30421#issuecomment-733180683







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30421:
URL: https://github.com/apache/spark/pull/30421#issuecomment-733180683







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30482: [SPARK-33529][SQL] Handle '__HIVE_DEFAULT_PARTITION__' while resolving V2 partition specs

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30482:
URL: https://github.com/apache/spark/pull/30482#issuecomment-733179070







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30482: [SPARK-33529][SQL] Handle '__HIVE_DEFAULT_PARTITION__' while resolving V2 partition specs

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30482:
URL: https://github.com/apache/spark/pull/30482#issuecomment-733179070







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30480: [SPARK-32921][SHUFFLE][test-maven][test-hadoop2.7] MapOutputTracker extensions to support push-based shuffle

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-733173446


   **[Test build #131691 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131691/testReport)**
 for PR 30480 at commit 
[`3723389`](https://github.com/apache/spark/commit/3723389c6d9a3bd1a101cb97bf5e98406dbf6fd6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30480: [SPARK-32921][SHUFFLE][test-maven][test-hadoop2.7] MapOutputTracker extensions to support push-based shuffle

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-733175455







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jkleckner commented on pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-24 Thread GitBox


jkleckner commented on pull request #30283:
URL: https://github.com/apache/spark/pull/30283#issuecomment-733175795


   Thank you, @dongjoon-hyun !
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30480: [SPARK-32921][SHUFFLE][test-maven][test-hadoop2.7] MapOutputTracker extensions to support push-based shuffle

2020-11-24 Thread GitBox


SparkQA commented on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-733175437


   **[Test build #131691 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131691/testReport)**
 for PR 30480 at commit 
[`3723389`](https://github.com/apache/spark/commit/3723389c6d9a3bd1a101cb97bf5e98406dbf6fd6).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30480: [SPARK-32921][SHUFFLE][test-maven][test-hadoop2.7] MapOutputTracker extensions to support push-based shuffle

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-733175455







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30421:
URL: https://github.com/apache/spark/pull/30421#issuecomment-733174299







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30421:
URL: https://github.com/apache/spark/pull/30421#issuecomment-733174299







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

2020-11-24 Thread GitBox


SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733173573


   **[Test build #131692 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131692/testReport)**
 for PR 30427 at commit 
[`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30475: [SPARK-33522][SQL] Improve exception messages while handling UnresolvedTableOrView

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30475:
URL: https://github.com/apache/spark/pull/30475#issuecomment-733173015







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30468: [SPARK-33518][ML][WIP] Improve performance of ML ALS recommendForAll by GEMV

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30468:
URL: https://github.com/apache/spark/pull/30468#issuecomment-733173014







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30480: [SPARK-32921][SHUFFLE][test-maven][test-hadoop2.7] MapOutputTracker extensions to support push-based shuffle

2020-11-24 Thread GitBox


SparkQA commented on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-733173446


   **[Test build #131691 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131691/testReport)**
 for PR 30480 at commit 
[`3723389`](https://github.com/apache/spark/commit/3723389c6d9a3bd1a101cb97bf5e98406dbf6fd6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30470: [SPARK-33495][BUILD] Remove commons-logging.jar's dependency

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30470:
URL: https://github.com/apache/spark/pull/30470#issuecomment-733173016







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733173017







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733173017







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30475: [SPARK-33522][SQL] Improve exception messages while handling UnresolvedTableOrView

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30475:
URL: https://github.com/apache/spark/pull/30475#issuecomment-733173015







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30468: [SPARK-33518][ML][WIP] Improve performance of ML ALS recommendForAll by GEMV

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30468:
URL: https://github.com/apache/spark/pull/30468#issuecomment-733173014







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30470: [SPARK-33495][BUILD] Remove commons-logging.jar's dependency

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30470:
URL: https://github.com/apache/spark/pull/30470#issuecomment-733173016







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30468: [SPARK-33518][ML][WIP] Improve performance of ML ALS recommendForAll by GEMV

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30468:
URL: https://github.com/apache/spark/pull/30468#issuecomment-733115921


   **[Test build #131684 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131684/testReport)**
 for PR 30468 at commit 
[`8ca7d56`](https://github.com/apache/spark/commit/8ca7d562c20812062e11e8f6961034157cc08ea8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30468: [SPARK-33518][ML][WIP] Improve performance of ML ALS recommendForAll by GEMV

2020-11-24 Thread GitBox


SparkQA commented on pull request #30468:
URL: https://github.com/apache/spark/pull/30468#issuecomment-733163795


   **[Test build #131684 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131684/testReport)**
 for PR 30468 at commit 
[`8ca7d56`](https://github.com/apache/spark/commit/8ca7d562c20812062e11e8f6961034157cc08ea8).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

2020-11-24 Thread GitBox


dongjoon-hyun commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733156769







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30475: [SPARK-33522][SQL] Improve exception messages while handling UnresolvedTableOrView

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30475:
URL: https://github.com/apache/spark/pull/30475#issuecomment-733123537


   **[Test build #131686 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131686/testReport)**
 for PR 30475 at commit 
[`68ee277`](https://github.com/apache/spark/commit/68ee277cbde9ecb466b1480af676a2f831e11236).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30475: [SPARK-33522][SQL] Improve exception messages while handling UnresolvedTableOrView

2020-11-24 Thread GitBox


SparkQA commented on pull request #30475:
URL: https://github.com/apache/spark/pull/30475#issuecomment-733156509


   **[Test build #131686 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131686/testReport)**
 for PR 30475 at commit 
[`68ee277`](https://github.com/apache/spark/commit/68ee277cbde9ecb466b1480af676a2f831e11236).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class TruncateTable(`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #28647:
URL: https://github.com/apache/spark/pull/28647#issuecomment-733150165







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30479: [WIP][SPARK-33527][SQL] Extend the function of decode so as consistent with mainstream databases

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30479:
URL: https://github.com/apache/spark/pull/30479#issuecomment-733149740







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30487: [SPARK-33535][INFRA][TESTS] Export LANG to en_US.UTF-8 in run-tests-jenkins script

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30487:
URL: https://github.com/apache/spark/pull/30487#issuecomment-733149187







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-24 Thread GitBox


dongjoon-hyun closed pull request #30283:
URL: https://github.com/apache/spark/pull/30283


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-24 Thread GitBox


dongjoon-hyun commented on a change in pull request #30283:
URL: https://github.com/apache/spark/pull/30283#discussion_r529783873



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala
##
@@ -42,13 +44,20 @@ private[k8s] trait LoggingPodStatusWatcher extends 
Watcher[Pod] {
  */
 private[k8s] class LoggingPodStatusWatcherImpl(
 appId: String,
-maybeLoggingInterval: Option[Long])
+maybeLoggingInterval: Option[Long],
+waitForCompletion: Boolean)

Review comment:
   ~Please revert this change. This is inconsistent with Apache Spark 3.1 
and 3.0.~

##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
##
@@ -230,7 +242,9 @@ private[spark] class KubernetesClientApplication extends 
SparkApplication {
 val master = KubernetesUtils.parseMasterUrl(sparkConf.get("spark.master"))
 val loggingInterval = if (waitForAppCompletion) 
Some(sparkConf.get(REPORT_INTERVAL)) else None
 
-val watcher = new LoggingPodStatusWatcherImpl(kubernetesAppId, 
loggingInterval)
+val watcher = new LoggingPodStatusWatcherImpl(kubernetesAppId,
+  loggingInterval,
+  waitForAppCompletion)

Review comment:
   ~Please revert this change. This is inconsistent with Apache Spark 3.1 
and 3.0.~





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30283: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-24 Thread GitBox


dongjoon-hyun commented on a change in pull request #30283:
URL: https://github.com/apache/spark/pull/30283#discussion_r529783812



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
##
@@ -230,7 +242,9 @@ private[spark] class KubernetesClientApplication extends 
SparkApplication {
 val master = KubernetesUtils.parseMasterUrl(sparkConf.get("spark.master"))
 val loggingInterval = if (waitForAppCompletion) 
Some(sparkConf.get(REPORT_INTERVAL)) else None
 
-val watcher = new LoggingPodStatusWatcherImpl(kubernetesAppId, 
loggingInterval)
+val watcher = new LoggingPodStatusWatcherImpl(kubernetesAppId,
+  loggingInterval,
+  waitForAppCompletion)

Review comment:
   Please revert this change. This is inconsistent with Apache Spark 3.1 
and 3.0.

##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala
##
@@ -42,13 +44,20 @@ private[k8s] trait LoggingPodStatusWatcher extends 
Watcher[Pod] {
  */
 private[k8s] class LoggingPodStatusWatcherImpl(
 appId: String,
-maybeLoggingInterval: Option[Long])
+maybeLoggingInterval: Option[Long],
+waitForCompletion: Boolean)

Review comment:
   Please revert this change. This is inconsistent with Apache Spark 3.1 
and 3.0.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #28647:
URL: https://github.com/apache/spark/pull/28647#issuecomment-733150165







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30479: [WIP][SPARK-33527][SQL] Extend the function of decode so as consistent with mainstream databases

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30479:
URL: https://github.com/apache/spark/pull/30479#issuecomment-733149740







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30487: [SPARK-33535][INFRA][TESTS] Export LANG to en_US.UTF-8 in run-tests-jenkins script

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30487:
URL: https://github.com/apache/spark/pull/30487#issuecomment-733149187







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30336: [SPARK-33287][SS][UI]Expose state custom metrics information on SS UI

2020-11-24 Thread GitBox


SparkQA commented on pull request #30336:
URL: https://github.com/apache/spark/pull/30336#issuecomment-733148850


   **[Test build #131690 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131690/testReport)**
 for PR 30336 at commit 
[`e2f1594`](https://github.com/apache/spark/commit/e2f15947ef6e97fbcee5f9f328e6e8315a2eea66).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

2020-11-24 Thread GitBox


SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733148701


   **[Test build #131689 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131689/testReport)**
 for PR 30408 at commit 
[`752eb8d`](https://github.com/apache/spark/commit/752eb8dab286aa78925851bb2ebd0b0ce816def6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30480: [SPARK-32921][SHUFFLE][test-maven][test-hadoop2.7] MapOutputTracker extensions to support push-based shuffle

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-733146296


   **[Test build #131687 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131687/testReport)**
 for PR 30480 at commit 
[`b9c43c5`](https://github.com/apache/spark/commit/b9c43c5550724970ebc6446cbcbb96d4ab9772ff).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30480: [SPARK-32921][SHUFFLE][test-maven][test-hadoop2.7] MapOutputTracker extensions to support push-based shuffle

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-733147893







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30487: [SPARK-33535][INFRA][TESTS] Export LANG to en_US.UTF-8 in run-tests-jenkins script

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30487:
URL: https://github.com/apache/spark/pull/30487#issuecomment-733031613


   **[Test build #131669 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131669/testReport)**
 for PR 30487 at commit 
[`5c90411`](https://github.com/apache/spark/commit/5c9041103fff89089ca136e97d8181c359e76b7a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30480: [SPARK-32921][SHUFFLE][test-maven][test-hadoop2.7] MapOutputTracker extensions to support push-based shuffle

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-733147893







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30480: [SPARK-32921][SHUFFLE][test-maven][test-hadoop2.7] MapOutputTracker extensions to support push-based shuffle

2020-11-24 Thread GitBox


SparkQA commented on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-733147874


   **[Test build #131687 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131687/testReport)**
 for PR 30480 at commit 
[`b9c43c5`](https://github.com/apache/spark/commit/b9c43c5550724970ebc6446cbcbb96d4ab9772ff).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30487: [SPARK-33535][INFRA][TESTS] Export LANG to en_US.UTF-8 in run-tests-jenkins script

2020-11-24 Thread GitBox


SparkQA commented on pull request #30487:
URL: https://github.com/apache/spark/pull/30487#issuecomment-733147816


   **[Test build #131669 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131669/testReport)**
 for PR 30487 at commit 
[`5c90411`](https://github.com/apache/spark/commit/5c9041103fff89089ca136e97d8181c359e76b7a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30440: [SPARK-33496][SQL]Improve error message of ANSI explicit cast

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30440:
URL: https://github.com/apache/spark/pull/30440#issuecomment-733146882







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30483: [WIP][SPARK-33449][SQL] Add File Metadata cache support for Parquet and Orc

2020-11-24 Thread GitBox


dongjoon-hyun commented on a change in pull request #30483:
URL: https://github.com/apache/spark/pull/30483#discussion_r529778176



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -765,6 +765,11 @@ object SQLConf {
 .booleanConf
 .createWithDefault(false)
 
+  val PARQUET_META_CACHE_ENABLED = 
buildConf("spark.sql.parquet.metadataCache.enabled")

Review comment:
   Please add `.doc("...")`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30440: [SPARK-33496][SQL]Improve error message of ANSI explicit cast

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30440:
URL: https://github.com/apache/spark/pull/30440#issuecomment-733146882







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30440: [SPARK-33496][SQL]Improve error message of ANSI explicit cast

2020-11-24 Thread GitBox


SparkQA removed a comment on pull request #30440:
URL: https://github.com/apache/spark/pull/30440#issuecomment-732993024


   **[Test build #131664 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131664/testReport)**
 for PR 30440 at commit 
[`e762162`](https://github.com/apache/spark/commit/e762162311e04c20bb06f9a4735514547050b832).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30440: [SPARK-33496][SQL]Improve error message of ANSI explicit cast

2020-11-24 Thread GitBox


SparkQA commented on pull request #30440:
URL: https://github.com/apache/spark/pull/30440#issuecomment-733146375


   **[Test build #131664 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131664/testReport)**
 for PR 30440 at commit 
[`e762162`](https://github.com/apache/spark/commit/e762162311e04c20bb06f9a4735514547050b832).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30480: [SPARK-32921][SHUFFLE][test-maven][test-hadoop2.7] MapOutputTracker extensions to support push-based shuffle

2020-11-24 Thread GitBox


SparkQA commented on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-733146296


   **[Test build #131687 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131687/testReport)**
 for PR 30480 at commit 
[`b9c43c5`](https://github.com/apache/spark/commit/b9c43c5550724970ebc6446cbcbb96d4ab9772ff).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30480: [SPARK-32921][SHUFFLE][test-maven][test-hadoop2.7] MapOutputTracker extensions to support push-based shuffle

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-732756039


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30478: [SPARK-33525][SQL] Update hive-service-rpc to 3.1.2

2020-11-24 Thread GitBox


SparkQA commented on pull request #30478:
URL: https://github.com/apache/spark/pull/30478#issuecomment-733146309


   **[Test build #131688 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131688/testReport)**
 for PR 30478 at commit 
[`43d90ca`](https://github.com/apache/spark/commit/43d90cafaf0aa4c8c4a355070d5c71008f6f3ea9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

2020-11-24 Thread GitBox


wangyum commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733146199


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30480: [SPARK-32921][SHUFFLE][test-maven][test-hadoop2.7] MapOutputTracker extensions to support push-based shuffle

2020-11-24 Thread GitBox


dongjoon-hyun commented on pull request #30480:
URL: https://github.com/apache/spark/pull/30480#issuecomment-733145845


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30480: [SPARK-32921][SHUFFLE][test-maven][test-hadoop2.7] MapOutputTracker extensions to support push-based shuffle

2020-11-24 Thread GitBox


dongjoon-hyun commented on a change in pull request #30480:
URL: https://github.com/apache/spark/pull/30480#discussion_r529776436



##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -1011,4 +1333,47 @@ private[spark] object MapOutputTracker extends Logging {
 
 splitsByAddress.mapValues(_.toSeq).iterator
   }
+
+  /**
+   * Given a shuffle ID, a partition ID, an array of map statuses, and bitmap 
corresponding
+   * to either a merged shuffle partition or a merged shuffle partition chunk, 
identify
+   * the metadata about the shuffle partition blocks that are merged into the 
merged shuffle
+   * partition or partition chunk represented by the bitmap.
+   *
+   * @param shuffleId Identifier for the shuffle
+   * @param partitionId The partition ID of the MergeStatus for which we look 
for the metadata
+   *of the merged shuffle partition blocks
+   * @param mapStatuses List of map statuses, indexed by map ID
+   * @param tracker bitmap containing mapIndexes that belong to the merged 
block or merged
+   *block chunk.
+   * @return A sequence of 2-item tuples, where the first item in the tuple is 
a BlockManagerId,
+   * and the second item is a sequence of (shuffle block ID, shuffle 
block size) tuples
+   * describing the shuffle blocks that are stored at that block 
manager.
+   */
+  def getMapStatusesForMergeStatus(
+  shuffleId: Int,
+  partitionId: Int,
+  mapStatuses: Array[MapStatus],
+  tracker: RoaringBitmap): Seq[(BlockManagerId, Seq[(BlockId, Long, 
Int)])] = {
+assert (mapStatuses != null && tracker != null)
+val splitsByAddress = new HashMap[BlockManagerId, ListBuffer[(BlockId, 
Long, Int)]]
+for ((status, mapIndex) <- mapStatuses.zipWithIndex) {
+  // Only add blocks that are merged
+  if (tracker.contains(mapIndex)) {
+MapOutputTracker.validateStatus(status, shuffleId, partitionId)
+splitsByAddress.getOrElseUpdate(status.location, ListBuffer()) +=
+  ((ShuffleBlockId(shuffleId, status.mapId, partitionId),
+status.getSizeForBlock(partitionId), mapIndex))
+  }
+}
+splitsByAddress.toSeq

Review comment:
   @Victsm . First of all, could you check this place? This seems to break 
Scala 2.13 compilation.
   ```
   [error] 
/home/runner/work/spark/spark/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:1369:21:
 type mismatch;
   [error]  found   : Seq[(org.apache.spark.storage.BlockManagerId, 
scala.collection.mutable.ListBuffer[(org.apache.spark.storage.BlockId, Long, 
Int)])]
   [error]  required: Seq[(org.apache.spark.storage.BlockManagerId, 
Seq[(org.apache.spark.storage.BlockId, Long, Int)])]
   [error] splitsByAddress.toSeq
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

2020-11-24 Thread GitBox


viirya commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r529776310



##
File path: python/pyspark/mllib/util.py
##
@@ -313,24 +363,30 @@ def convertMatrixColumnsToML(dataset, *cols):
 return callMLlibFunc("convertMatrixColumnsToML", dataset, list(cols))
 
 @staticmethod
-@since("2.0.0")
 def convertMatrixColumnsFromML(dataset, *cols):
 """
 Converts matrix columns in an input DataFrame to the
 :py:class:`pyspark.mllib.linalg.Matrix` type from the new
 :py:class:`pyspark.ml.linalg.Matrix` type under the `spark.ml`
 package.
 
-:param dataset:
-  input dataset
-:param cols:
-  a list of matrix columns to be converted.
-  Old matrix columns will be ignored. If unspecified, all new
-  matrix columns will be converted except nested ones.
-:return:
-  the input dataset with new matrix columns converted to the
-  old matrix type
+.. versionadded:: 2.0.0
+
+dataset : :py:class:`pyspark.sql.DataFrame`
+input dataset
+cols : str

Review comment:
   ditto.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

2020-11-24 Thread GitBox


viirya commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r529776085



##
File path: python/pyspark/mllib/util.py
##
@@ -273,24 +317,30 @@ def convertVectorColumnsFromML(dataset, *cols):
 return callMLlibFunc("convertVectorColumnsFromML", dataset, list(cols))
 
 @staticmethod
-@since("2.0.0")
 def convertMatrixColumnsToML(dataset, *cols):
 """
 Converts matrix columns in an input DataFrame from the
 :py:class:`pyspark.mllib.linalg.Matrix` type to the new
 :py:class:`pyspark.ml.linalg.Matrix` type under the `spark.ml`
 package.
 
-:param dataset:
-  input dataset
-:param cols:
-  a list of matrix columns to be converted.
-  New matrix columns will be ignored. If unspecified, all old
-  matrix columns will be converted excepted nested ones.
-:return:
-  the input dataset with old matrix columns converted to the
-  new matrix type
+.. versionadded:: 2.0.0
 
+dataset : :py:class:`pyspark.sql.DataFrame`
+input dataset
+cols : str

Review comment:
   Is it a str or list of str?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30465: [SPARK-33045][SQL][FOLLOWUP] Support built-in function like_any and fix StackOverflowError issue.

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30465:
URL: https://github.com/apache/spark/pull/30465#issuecomment-733143722







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30482: [SPARK-33529][SQL] Handle '__HIVE_DEFAULT_PARTITION__' while resolving V2 partition specs

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30482:
URL: https://github.com/apache/spark/pull/30482#issuecomment-733143719







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30481: [SPARK-33526][SQL] Add config to control if cancel invoke interrupt task on thriftserver

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30481:
URL: https://github.com/apache/spark/pull/30481#issuecomment-733143723







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30467: [SPARK-32002][SQL]Support ExtractValue from nested ArrayStruct

2020-11-24 Thread GitBox


AmplabJenkins removed a comment on pull request #30467:
URL: https://github.com/apache/spark/pull/30467#issuecomment-733143724







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30465: [SPARK-33045][SQL][FOLLOWUP] Support built-in function like_any and fix StackOverflowError issue.

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30465:
URL: https://github.com/apache/spark/pull/30465#issuecomment-733143722







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30481: [SPARK-33526][SQL] Add config to control if cancel invoke interrupt task on thriftserver

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30481:
URL: https://github.com/apache/spark/pull/30481#issuecomment-733143723







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30482: [SPARK-33529][SQL] Handle '__HIVE_DEFAULT_PARTITION__' while resolving V2 partition specs

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30482:
URL: https://github.com/apache/spark/pull/30482#issuecomment-733143719







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30467: [SPARK-32002][SQL]Support ExtractValue from nested ArrayStruct

2020-11-24 Thread GitBox


AmplabJenkins commented on pull request #30467:
URL: https://github.com/apache/spark/pull/30467#issuecomment-733143724







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gaborgsomogyi commented on a change in pull request #30336: [SPARK-33287][SS][UI]Expose state custom metrics information on SS UI

2020-11-24 Thread GitBox


gaborgsomogyi commented on a change in pull request #30336:
URL: https://github.com/apache/spark/pull/30336#discussion_r529772096



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##
@@ -199,49 +209,100 @@ private[ui] class StreamingQueryStatisticsPage(parent: 
StreamingQueryTab)
   "records")
   graphUIDataForNumRowsDroppedByWatermark.generateDataJs(jsCollector)
 
-  // scalastyle:off
-  
-
-  
-Aggregated Number Of Total State Rows 
{SparkUIUtils.tooltip("Aggregated number of total state rows.", 
"right")}
-  
-
-{graphUIDataForNumberTotalRows.generateTimelineHtml(jsCollector)}
-{graphUIDataForNumberTotalRows.generateHistogramHtml(jsCollector)}
-  
-  
-
-  
-Aggregated Number Of Updated State Rows 
{SparkUIUtils.tooltip("Aggregated number of updated state rows.", 
"right")}
-  
-
-{graphUIDataForNumberUpdatedRows.generateTimelineHtml(jsCollector)}
-{graphUIDataForNumberUpdatedRows.generateHistogramHtml(jsCollector)}
-  
-  
-
-  
-Aggregated State Memory Used In Bytes 
{SparkUIUtils.tooltip("Aggregated state memory used in bytes.", 
"right")}
-  
-
-{graphUIDataForMemoryUsedBytes.generateTimelineHtml(jsCollector)}
-{graphUIDataForMemoryUsedBytes.generateHistogramHtml(jsCollector)}
-  
-  
-
-  
-Aggregated Number Of Rows Dropped By Watermark 
{SparkUIUtils.tooltip("Accumulates all input rows being dropped in stateful 
operators by watermark. 'Inputs' are relative to operators.", 
"right")}
-  
-
-{graphUIDataForNumRowsDroppedByWatermark.generateTimelineHtml(jsCollector)}
-{graphUIDataForNumRowsDroppedByWatermark.generateHistogramHtml(jsCollector)}
-  
-  // scalastyle:on
+  val result =
+// scalastyle:off
+
+  
+
+  Aggregated Number Of Total State Rows 
{SparkUIUtils.tooltip("Aggregated number of total state rows.", 
"right")}
+
+  
+  {graphUIDataForNumberTotalRows.generateTimelineHtml(jsCollector)}
+  {graphUIDataForNumberTotalRows.generateHistogramHtml(jsCollector)}
+
+
+  
+
+  Aggregated Number Of Updated State Rows 
{SparkUIUtils.tooltip("Aggregated number of updated state rows.", 
"right")}
+
+  
+  {graphUIDataForNumberUpdatedRows.generateTimelineHtml(jsCollector)}
+  {graphUIDataForNumberUpdatedRows.generateHistogramHtml(jsCollector)}
+
+
+  
+
+  Aggregated State Memory Used In Bytes 
{SparkUIUtils.tooltip("Aggregated state memory used in bytes.", 
"right")}
+
+  
+  {graphUIDataForMemoryUsedBytes.generateTimelineHtml(jsCollector)}
+  {graphUIDataForMemoryUsedBytes.generateHistogramHtml(jsCollector)}
+
+
+  
+
+  Aggregated Number Of Rows Dropped By Watermark 
{SparkUIUtils.tooltip("Accumulates all input rows being dropped in stateful 
operators by watermark. 'Inputs' are relative to operators.", 
"right")}
+
+  
+  {graphUIDataForNumRowsDroppedByWatermark.generateTimelineHtml(jsCollector)}
+  {graphUIDataForNumRowsDroppedByWatermark.generateHistogramHtml(jsCollector)}
+
+// scalastyle:on
+
+  result ++= generateAggregatedCustomMetrics(query, minBatchTime, 
maxBatchTime, jsCollector)
+  result
 } else {
   new NodeBuffer()
 }
   }
 
+  def generateAggregatedCustomMetrics(
+  query: StreamingQueryUIData,
+  minBatchTime: Long,
+  maxBatchTime: Long,
+  jsCollector: JsCollector): NodeBuffer = {
+val result: NodeBuffer = new NodeBuffer
+
+// This is made sure on caller side but put it here to be defensive
+require(query.lastProgress.stateOperators.nonEmpty)
+val enabledCustomMetrics = 
parent.parent.conf.get(ENABLED_STREAMING_UI_CUSTOM_METRIC_LIST)
+logDebug(s"Enabled custom metrics: $enabledCustomMetrics")
+query.lastProgress.stateOperators.head.customMetrics.keySet().asScala
+  .filter(enabledCustomMetrics.contains(_)).map { metricName =>

Review comment:
   OK, if that's so we I'll add it. I've checked similar list based configs 
where it's not done like that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For ad

[GitHub] [spark] viirya commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

2020-11-24 Thread GitBox


viirya commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r529771614



##
File path: python/pyspark/mllib/stat/_statistics.py
##
@@ -65,11 +65,19 @@ def colStats(rdd):
 """
 Computes column-wise summary statistics for the input RDD[Vector].
 
-:param rdd: an RDD[Vector] for which column-wise summary statistics
-are to be computed.
-:return: :class:`MultivariateStatisticalSummary` object containing
- column-wise summary statistics.
-
+Parameters
+--
+rdd : :py:class:`pyspark.RDD`
+an RDD[Vector] for which column-wise summary statistics

Review comment:
   `RDD[Vector]` looks Scala syntax? How about RDD of Vector?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30478: [SPARK-33525][SQL] Update hive-service-rpc to 3.1.2

2020-11-24 Thread GitBox


dongjoon-hyun commented on pull request #30478:
URL: https://github.com/apache/spark/pull/30478#issuecomment-733141118







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30429: [SPARK-33492][SQL] DSv2: Append/Overwrite/ReplaceTable should invalidate cache

2020-11-24 Thread GitBox


dongjoon-hyun commented on pull request #30429:
URL: https://github.com/apache/spark/pull/30429#issuecomment-733140355


   Thank you, @rdblue , @cloud-fan , @sunchao .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

2020-11-24 Thread GitBox


viirya commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r529769785



##
File path: python/pyspark/mllib/regression.py
##
@@ -224,11 +234,13 @@ def _regression_train_wrapper(train_func, modelClass, 
data, initial_weights):
 
 class LinearRegressionWithSGD(object):
 """
+Train a linear regression model with no regularization using Stochastic 
Gradient Descent.
+
 .. versionadded:: 0.9.0
-.. note:: Deprecated in 2.0.0. Use ml.regression.LinearRegression.
+.. deprecated:: 2.0.0.

Review comment:
   2.0.0. -> 2.0.0





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30487: [SPARK-33535][INFRA][TESTS] Export LANG to en_US.UTF-8 in run-tests-jenkins script

2020-11-24 Thread GitBox


dongjoon-hyun commented on pull request #30487:
URL: https://github.com/apache/spark/pull/30487#issuecomment-733139251


   Merged to master/branch-3.0/branch-2.4.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >