[GitHub] [spark] SparkQA commented on pull request #30045: [SPARK-32991][SQL] Use conf in shared state as the original configuraion for RESET

2020-10-16 Thread GitBox


SparkQA commented on pull request #30045:
URL: https://github.com/apache/spark/pull/30045#issuecomment-709909565


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34488/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #30009: [SPARK-32907][ML] adaptively blockify instances - LinearSVC

2020-10-16 Thread GitBox


zhengruifeng commented on a change in pull request #30009:
URL: https://github.com/apache/spark/pull/30009#discussion_r506167739



##
File path: mllib/src/main/scala/org/apache/spark/ml/feature/Instance.scala
##
@@ -114,6 +133,62 @@ private[spark] object InstanceBlock {
   def blokify(instances: RDD[Instance], blockSize: Int): RDD[InstanceBlock] = {
 
instances.mapPartitions(_.grouped(blockSize).map(InstanceBlock.fromInstances))
   }
+
+  def blokifyWithMaxMemUsage(
+  iterator: Iterator[Instance],
+  maxMemUsage: Long): Iterator[InstanceBlock] = {
+require(maxMemUsage > 0)
+
+new Iterator[InstanceBlock] {
+  private var numCols = -1L
+  private val buff = mutable.ArrayBuilder.make[Instance]
+
+  override def hasNext: Boolean = iterator.hasNext
+
+  override def next(): InstanceBlock = {
+buff.clear()
+var buffCnt = 0L
+var buffNnz = 0L
+var buffUnitWeight = true
+var blockMemUsage = 0L
+
+while (iterator.hasNext && blockMemUsage < maxMemUsage) {
+  val instance = iterator.next()
+  if (numCols < 0L) numCols = instance.features.size
+  require(numCols == instance.features.size)
+  val nnz = instance.features.numNonzeros
+
+  buff += instance
+  buffCnt += 1L
+  buffNnz += nnz
+  buffUnitWeight &&= (instance.weight == 1)
+  blockMemUsage = getBlockMemUsage(numCols, buffCnt, buffNnz, 
buffUnitWeight)
+}
+
+// the block mem usage may slightly exceed threshold, not a big issue.
+// and this ensure even if one row exceed block limit, each block has 
one row
+InstanceBlock.fromInstances(buff.result())
+  }
+}
+  }
+
+  def blokifyWithMaxMemUsage(
+  instances: RDD[Instance],
+  maxMemUsage: Long): RDD[InstanceBlock] = {
+require(maxMemUsage > 0)
+instances.mapPartitions(iter => blokifyWithMaxMemUsage(iter, maxMemUsage))
+  }
+
+  def inferBlockSizeInMB(
+  dim: Int,
+  avgNNZ: Double,
+  blasLevel: Int = 2): Double = {
+if (dim <= avgNNZ * 3) {
+  0.25
+} else {
+  64.0
+}

Review comment:
   Current strategy is quitely simple, I think we may use a complex 
costmodel if necessay in the future.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gemelen commented on pull request #29995: [SPARK-33080][BUILD] Replace fatal warnings snippet

2020-10-16 Thread GitBox


gemelen commented on pull request #29995:
URL: https://github.com/apache/spark/pull/29995#issuecomment-709904229


   @srowen thanks a lot for your efforts to pass this changeset



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30009: [SPARK-32907][ML] adaptively blockify instances - LinearSVC

2020-10-16 Thread GitBox


SparkQA commented on pull request #30009:
URL: https://github.com/apache/spark/pull/30009#issuecomment-709903604


   **[Test build #129886 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129886/testReport)**
 for PR 30009 at commit 
[`c0a734d`](https://github.com/apache/spark/commit/c0a734de5e4d4df819caa4f86634242966d5786b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

2020-10-16 Thread GitBox


SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709903663


   **[Test build #129887 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129887/testReport)**
 for PR 28938 at commit 
[`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30045: [SPARK-32991][SQL] Use conf in shared state as the original configuraion for RESET

2020-10-16 Thread GitBox


SparkQA commented on pull request #30045:
URL: https://github.com/apache/spark/pull/30045#issuecomment-709900599


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34488/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct

2020-10-16 Thread GitBox


SparkQA commented on pull request #30026:
URL: https://github.com/apache/spark/pull/30026#issuecomment-709899037


   **[Test build #129885 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129885/testReport)**
 for PR 30026 at commit 
[`5769222`](https://github.com/apache/spark/commit/5769be0ec45243c9fc574dd6ff06c87f9024).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

2020-10-16 Thread GitBox


LantaoJin commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709898787


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns

2020-10-16 Thread GitBox


SparkQA commented on pull request #30053:
URL: https://github.com/apache/spark/pull/30053#issuecomment-709897406


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34487/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30053:
URL: https://github.com/apache/spark/pull/30053#issuecomment-709895941


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34486/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns

2020-10-16 Thread GitBox


AmplabJenkins commented on pull request #30053:
URL: https://github.com/apache/spark/pull/30053#issuecomment-709895923







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30053:
URL: https://github.com/apache/spark/pull/30053#issuecomment-709895923


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns

2020-10-16 Thread GitBox


SparkQA commented on pull request #30053:
URL: https://github.com/apache/spark/pull/30053#issuecomment-709895901


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34486/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a change in pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct

2020-10-16 Thread GitBox


LuciferYang commented on a change in pull request #30026:
URL: https://github.com/apache/spark/pull/30026#discussion_r506151340



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##
@@ -139,20 +142,22 @@ class BasicWriteJobStatsTracker(
 
   override def processStats(stats: Seq[WriteTaskStats]): Unit = {
 val sparkContext = SparkContext.getActive.get
-var numPartitions: Long = 0L
+var partitionsSet: mutable.Set[InternalRow] = mutable.HashSet.empty
 var numFiles: Long = 0L
 var totalNumBytes: Long = 0L
 var totalNumOutput: Long = 0L
 
 val basicStats = stats.map(_.asInstanceOf[BasicWriteTaskStats])
 
 basicStats.foreach { summary =>
-  numPartitions += summary.numPartitions
+  partitionsSet ++= summary.partitions
   numFiles += summary.numFiles
   totalNumBytes += summary.numBytes
   totalNumOutput += summary.numRows
 }
 
+val numPartitions: Long = partitionsSet.size

Review comment:
   Address 5769222 fix this





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a change in pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct

2020-10-16 Thread GitBox


LuciferYang commented on a change in pull request #30026:
URL: https://github.com/apache/spark/pull/30026#discussion_r506148242



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##
@@ -139,20 +142,22 @@ class BasicWriteJobStatsTracker(
 
   override def processStats(stats: Seq[WriteTaskStats]): Unit = {
 val sparkContext = SparkContext.getActive.get
-var numPartitions: Long = 0L
+var partitionsSet: mutable.Set[InternalRow] = mutable.HashSet.empty
 var numFiles: Long = 0L
 var totalNumBytes: Long = 0L
 var totalNumOutput: Long = 0L
 
 val basicStats = stats.map(_.asInstanceOf[BasicWriteTaskStats])
 
 basicStats.foreach { summary =>
-  numPartitions += summary.numPartitions
+  partitionsSet ++= summary.partitions

Review comment:
   ditto, `partitionsSet.addAll(summary.partitions)` can only be used in 
Scala 2.13 too.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##
@@ -76,7 +79,7 @@ class BasicWriteTaskStatsTracker(hadoopConf: Configuration)
 
 
   override def newPartition(partitionValues: InternalRow): Unit = {
-numPartitions += 1
+partitions = partitions :+ partitionValues

Review comment:
   `partitions.appended(partitionValues)` can only be used in Scala 2.13





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a change in pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct

2020-10-16 Thread GitBox


LuciferYang commented on a change in pull request #30026:
URL: https://github.com/apache/spark/pull/30026#discussion_r506147287



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##
@@ -76,7 +79,7 @@ class BasicWriteTaskStatsTracker(hadoopConf: Configuration)
 
 
   override def newPartition(partitionValues: InternalRow): Unit = {
-numPartitions += 1
+partitions = partitions :+ partitionValues

Review comment:
   `partitions.appended(partitionValues)` need Scala 2.13





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)

2020-10-16 Thread GitBox


SparkQA commented on pull request #30025:
URL: https://github.com/apache/spark/pull/30025#issuecomment-709892119


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34485/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)

2020-10-16 Thread GitBox


AmplabJenkins commented on pull request #30025:
URL: https://github.com/apache/spark/pull/30025#issuecomment-709892138







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30025:
URL: https://github.com/apache/spark/pull/30025#issuecomment-709892138







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.

2020-10-16 Thread GitBox


SparkQA commented on pull request #2:
URL: https://github.com/apache/spark/pull/2#issuecomment-709890593


   **[Test build #129884 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129884/testReport)**
 for PR 2 at commit 
[`f657ff0`](https://github.com/apache/spark/commit/f657ff0372f1cac48ea008a08c1cc7011f934d98).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.

2020-10-16 Thread GitBox


beliefer commented on a change in pull request #2:
URL: https://github.com/apache/spark/pull/2#discussion_r506142456



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##
@@ -176,6 +177,125 @@ case class Like(left: Expression, right: Expression, 
escapeChar: Char)
   }
 }
 
+abstract class LikeAllBase extends Expression with ImplicitCastInputTypes with 
NullIntolerant {
+  def value: Expression = children.head
+  def list: Seq[Expression] = children.tail
+  def isNot: Boolean
+
+  override def inputTypes: Seq[AbstractDataType] = {
+StringType +: Seq.fill(children.size - 1)(StringType)
+  }
+
+  override def dataType: DataType = BooleanType
+
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def nullable: Boolean = true
+
+  def matches(regex: Pattern, str: String): Boolean = 
regex.matcher(str).matches()
+
+  override def eval(input: InternalRow): Any = {
+val evaluatedValue = value.eval(input)
+if (evaluatedValue == null) {
+  null
+} else {
+  var hasNull = false
+  var match = true
+  list.foreach { e =>
+val str = e.eval(input)
+if (str == null) {
+  hasNull = true
+} else {
+  val regex =
+
Pattern.compile(StringUtils.escapeLikeRegex(str.asInstanceOf[UTF8String].toString,
 '\\'))
+  if ((isNot && matches(regex, 
evaluatedValue.asInstanceOf[UTF8String].toString)) ||
+!(isNot || matches(regex, 
evaluatedValue.asInstanceOf[UTF8String].toString)) {
+match = false
+  }
+}
+  }
+  if (hasNull) {
+null
+  } else {
+match
+  }
+}
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val patternClass = classOf[Pattern].getName
+val escapeFunc = StringUtils.getClass.getName.stripSuffix("$") + 
".escapeLikeRegex"
+val javaDataType = CodeGenerator.javaType(value.dataType)
+val valueGen = value.genCode(ctx)
+val listGen = list.map(_.genCode(ctx))
+val pattern = ctx.freshName("pattern")
+val rightStr = ctx.freshName("rightStr")
+val escapedEscapeChar = StringEscapeUtils.escapeJava("\\")
+val hasNull = ctx.freshName("hasNull")
+val matched = ctx.freshName("matched")
+val valueArg = ctx.freshName("valueArg")
+val listCode = listGen.map(x =>
+  s"""
+ |${x.code}
+ |if (${x.isNull}) {
+ |  $hasNull = true; // ${ev.isNull} = true;
+ |} else if (!$hasNull && $matched) {
+ |  String $rightStr = ${x.value}.toString();
+ |  $patternClass $pattern =
+ |$patternClass.compile($escapeFunc($rightStr, 
'$escapedEscapeChar'));

Review comment:
   OK. I will cache the pattern of foldable regex string.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns

2020-10-16 Thread GitBox


SparkQA commented on pull request #30053:
URL: https://github.com/apache/spark/pull/30053#issuecomment-709884964


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34486/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)

2020-10-16 Thread GitBox


SparkQA commented on pull request #30025:
URL: https://github.com/apache/spark/pull/30025#issuecomment-709883799


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34485/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30045: [SPARK-32991][SQL] Use conf in shared state as the original configuraion for RESET

2020-10-16 Thread GitBox


SparkQA commented on pull request #30045:
URL: https://github.com/apache/spark/pull/30045#issuecomment-709878764


   **[Test build #129883 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129883/testReport)**
 for PR 30045 at commit 
[`6848b2f`](https://github.com/apache/spark/commit/6848b2fed2be7137f4133bb7ec1790b9aad1ba29).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #30045: [SPARK-32991][SQL] Use conf in shared state as the original configuraion for RESET

2020-10-16 Thread GitBox


yaooqinn commented on pull request #30045:
URL: https://github.com/apache/spark/pull/30045#issuecomment-709878145


   cc @hvanhovell too



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30066:
URL: https://github.com/apache/spark/pull/30066#issuecomment-709877835







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job

2020-10-16 Thread GitBox


AmplabJenkins commented on pull request #30066:
URL: https://github.com/apache/spark/pull/30066#issuecomment-709877835







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job

2020-10-16 Thread GitBox


SparkQA commented on pull request #30066:
URL: https://github.com/apache/spark/pull/30066#issuecomment-709877814


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34484/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns

2020-10-16 Thread GitBox


SparkQA commented on pull request #30053:
URL: https://github.com/apache/spark/pull/30053#issuecomment-709874931


   **[Test build #129882 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129882/testReport)**
 for PR 30053 at commit 
[`96a0706`](https://github.com/apache/spark/commit/96a070601d813baf8749c274069777ca4fe89fd6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct

2020-10-16 Thread GitBox


cloud-fan commented on a change in pull request #30026:
URL: https://github.com/apache/spark/pull/30026#discussion_r506112571



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##
@@ -139,20 +142,22 @@ class BasicWriteJobStatsTracker(
 
   override def processStats(stats: Seq[WriteTaskStats]): Unit = {
 val sparkContext = SparkContext.getActive.get
-var numPartitions: Long = 0L
+var partitionsSet: mutable.Set[InternalRow] = mutable.HashSet.empty
 var numFiles: Long = 0L
 var totalNumBytes: Long = 0L
 var totalNumOutput: Long = 0L
 
 val basicStats = stats.map(_.asInstanceOf[BasicWriteTaskStats])
 
 basicStats.foreach { summary =>
-  numPartitions += summary.numPartitions
+  partitionsSet ++= summary.partitions
   numFiles += summary.numFiles
   totalNumBytes += summary.numBytes
   totalNumOutput += summary.numRows
 }
 
+val numPartitions: Long = partitionsSet.size

Review comment:
   nit: it's only used once, we can inline it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns

2020-10-16 Thread GitBox


SparkQA commented on pull request #30053:
URL: https://github.com/apache/spark/pull/30053#issuecomment-709870468


   **[Test build #129881 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129881/testReport)**
 for PR 30053 at commit 
[`2634588`](https://github.com/apache/spark/commit/2634588874042dd20c3293e4c67a7ae0199fe5b9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct

2020-10-16 Thread GitBox


cloud-fan commented on a change in pull request #30026:
URL: https://github.com/apache/spark/pull/30026#discussion_r506112213



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##
@@ -139,20 +142,22 @@ class BasicWriteJobStatsTracker(
 
   override def processStats(stats: Seq[WriteTaskStats]): Unit = {
 val sparkContext = SparkContext.getActive.get
-var numPartitions: Long = 0L
+var partitionsSet: mutable.Set[InternalRow] = mutable.HashSet.empty
 var numFiles: Long = 0L
 var totalNumBytes: Long = 0L
 var totalNumOutput: Long = 0L
 
 val basicStats = stats.map(_.asInstanceOf[BasicWriteTaskStats])
 
 basicStats.foreach { summary =>
-  numPartitions += summary.numPartitions
+  partitionsSet ++= summary.partitions

Review comment:
   ditto, `partitionsSet.addAll(summary.partitions)`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct

2020-10-16 Thread GitBox


cloud-fan commented on a change in pull request #30026:
URL: https://github.com/apache/spark/pull/30026#discussion_r506111562



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##
@@ -76,7 +79,7 @@ class BasicWriteTaskStatsTracker(hadoopConf: Configuration)
 
 
   override def newPartition(partitionValues: InternalRow): Unit = {
-numPartitions += 1
+partitions = partitions :+ partitionValues

Review comment:
   this looks like appending a immutable collection. Can we be more 
explicit? e.g. `partitions.append(partitionValues)`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job

2020-10-16 Thread GitBox


SparkQA commented on pull request #30066:
URL: https://github.com/apache/spark/pull/30066#issuecomment-709868917


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34484/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct

2020-10-16 Thread GitBox


cloud-fan commented on a change in pull request #30026:
URL: https://github.com/apache/spark/pull/30026#discussion_r506110093



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##
@@ -30,12 +32,13 @@ import org.apache.spark.sql.execution.metric.{SQLMetric, 
SQLMetrics}
 import org.apache.spark.util.SerializableConfiguration
 
 
+

Review comment:
   unnecessary change.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job

2020-10-16 Thread GitBox


AmplabJenkins commented on pull request #30066:
URL: https://github.com/apache/spark/pull/30066#issuecomment-709867933







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job

2020-10-16 Thread GitBox


SparkQA commented on pull request #30066:
URL: https://github.com/apache/spark/pull/30066#issuecomment-709867892


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34483/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct

2020-10-16 Thread GitBox


cloud-fan commented on pull request #30026:
URL: https://github.com/apache/spark/pull/30026#issuecomment-709868109


   > return size is partition num * shuffle num always can be millions level
   
   I thought about it. If a table has 10k partitions, it's unlikely that each 
write task touches all the 10k partitions. So the total size is not that large.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30066:
URL: https://github.com/apache/spark/pull/30066#issuecomment-709867933







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30059: [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30059:
URL: https://github.com/apache/spark/pull/30059#issuecomment-709865465


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129864/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30066:
URL: https://github.com/apache/spark/pull/30066#issuecomment-709864234


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30059: [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs

2020-10-16 Thread GitBox


SparkQA commented on pull request #30059:
URL: https://github.com/apache/spark/pull/30059#issuecomment-709864009


   **[Test build #129864 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129864/testReport)**
 for PR 30059 at commit 
[`f39ac87`](https://github.com/apache/spark/commit/f39ac871fc38e8ec8c02b7f6661748e2c7d431e9).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #2:
URL: https://github.com/apache/spark/pull/2#issuecomment-709864803


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129872/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30066:
URL: https://github.com/apache/spark/pull/30066#issuecomment-709864246


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129879/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30059: [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs

2020-10-16 Thread GitBox


SparkQA removed a comment on pull request #30059:
URL: https://github.com/apache/spark/pull/30059#issuecomment-709664186


   **[Test build #129864 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129864/testReport)**
 for PR 30059 at commit 
[`f39ac87`](https://github.com/apache/spark/commit/f39ac871fc38e8ec8c02b7f6661748e2c7d431e9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.

2020-10-16 Thread GitBox


SparkQA commented on pull request #2:
URL: https://github.com/apache/spark/pull/2#issuecomment-709863988


   **[Test build #129872 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129872/testReport)**
 for PR 2 at commit 
[`be5eb8a`](https://github.com/apache/spark/commit/be5eb8a1f092e15c941d39d517284aed67de72c9).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.

2020-10-16 Thread GitBox


SparkQA removed a comment on pull request #2:
URL: https://github.com/apache/spark/pull/2#issuecomment-709702627


   **[Test build #129872 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129872/testReport)**
 for PR 2 at commit 
[`be5eb8a`](https://github.com/apache/spark/commit/be5eb8a1f092e15c941d39d517284aed67de72c9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] linhongliu-db commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns

2020-10-16 Thread GitBox


linhongliu-db commented on pull request #30053:
URL: https://github.com/apache/spark/pull/30053#issuecomment-709865036


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job

2020-10-16 Thread GitBox


SparkQA commented on pull request #30066:
URL: https://github.com/apache/spark/pull/30066#issuecomment-709864008


   **[Test build #129879 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129879/testReport)**
 for PR 30066 at commit 
[`32ec11a`](https://github.com/apache/spark/commit/32ec11ac3866a88ee6628b22c4379e27ec9b212b).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30059: [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30059:
URL: https://github.com/apache/spark/pull/30059#issuecomment-709865453


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #2:
URL: https://github.com/apache/spark/pull/2#issuecomment-709864778


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30025:
URL: https://github.com/apache/spark/pull/30025#issuecomment-709864026


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30059: [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs

2020-10-16 Thread GitBox


AmplabJenkins commented on pull request #30059:
URL: https://github.com/apache/spark/pull/30059#issuecomment-709865453







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)

2020-10-16 Thread GitBox


SparkQA removed a comment on pull request #30025:
URL: https://github.com/apache/spark/pull/30025#issuecomment-709859600


   **[Test build #129880 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129880/testReport)**
 for PR 30025 at commit 
[`dfd6d4b`](https://github.com/apache/spark/commit/dfd6d4b5ee2bfad370ec57e264b1c18de038e8ae).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30025:
URL: https://github.com/apache/spark/pull/30025#issuecomment-709864035


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129880/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)

2020-10-16 Thread GitBox


cloud-fan commented on a change in pull request #30025:
URL: https://github.com/apache/spark/pull/30025#discussion_r506106800



##
File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala
##
@@ -48,4 +48,26 @@ private case object MySQLDialect extends JdbcDialect {
   }
 
   override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
+
+  // See https://dev.mysql.com/doc/refman/8.0/en/alter-table.html
+  override def getUpdateColumnTypeQuery(
+  tableName: String,
+  columnName: String,
+  newDataType: String): String = {
+s"ALTER TABLE $tableName MODIFY COLUMN ${quoteIdentifier(columnName)} 
$newDataType"
+  }
+
+  // See https://dev.mysql.com/doc/refman/8.0/en/alter-table.html
+  // require to have column data type to change the column nullability
+  // ALTER TABLE tbl_name MODIFY [COLUMN] col_name column_definition
+  // column_definition:
+  //data_type [NOT NULL | NULL]
+  // e.g. ALTER TABLE t1 MODIFY b INT NOT NULL;

Review comment:
   Spark knows the table schema and data type info is available. We need to 
pass the column type info to here though.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job

2020-10-16 Thread GitBox


AmplabJenkins commented on pull request #30066:
URL: https://github.com/apache/spark/pull/30066#issuecomment-709864234







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job

2020-10-16 Thread GitBox


SparkQA removed a comment on pull request #30066:
URL: https://github.com/apache/spark/pull/30066#issuecomment-709826033


   **[Test build #129879 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129879/testReport)**
 for PR 30066 at commit 
[`32ec11a`](https://github.com/apache/spark/commit/32ec11ac3866a88ee6628b22c4379e27ec9b212b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.

2020-10-16 Thread GitBox


AmplabJenkins commented on pull request #2:
URL: https://github.com/apache/spark/pull/2#issuecomment-709864778







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)

2020-10-16 Thread GitBox


SparkQA commented on pull request #30025:
URL: https://github.com/apache/spark/pull/30025#issuecomment-709863989


   **[Test build #129880 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129880/testReport)**
 for PR 30025 at commit 
[`dfd6d4b`](https://github.com/apache/spark/commit/dfd6d4b5ee2bfad370ec57e264b1c18de038e8ae).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)

2020-10-16 Thread GitBox


AmplabJenkins commented on pull request #30025:
URL: https://github.com/apache/spark/pull/30025#issuecomment-709864026







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30053:
URL: https://github.com/apache/spark/pull/30053#issuecomment-709861968


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129876/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30053:
URL: https://github.com/apache/spark/pull/30053#issuecomment-709861955


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns

2020-10-16 Thread GitBox


AmplabJenkins commented on pull request #30053:
URL: https://github.com/apache/spark/pull/30053#issuecomment-709861955







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns

2020-10-16 Thread GitBox


SparkQA removed a comment on pull request #30053:
URL: https://github.com/apache/spark/pull/30053#issuecomment-709781462


   **[Test build #129876 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129876/testReport)**
 for PR 30053 at commit 
[`2634588`](https://github.com/apache/spark/commit/2634588874042dd20c3293e4c67a7ae0199fe5b9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns

2020-10-16 Thread GitBox


SparkQA commented on pull request #30053:
URL: https://github.com/apache/spark/pull/30053#issuecomment-709861628


   **[Test build #129876 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129876/testReport)**
 for PR 30053 at commit 
[`2634588`](https://github.com/apache/spark/commit/2634588874042dd20c3293e4c67a7ae0199fe5b9).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)

2020-10-16 Thread GitBox


SparkQA commented on pull request #30025:
URL: https://github.com/apache/spark/pull/30025#issuecomment-709859600


   **[Test build #129880 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129880/testReport)**
 for PR 30025 at commit 
[`dfd6d4b`](https://github.com/apache/spark/commit/dfd6d4b5ee2bfad370ec57e264b1c18de038e8ae).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path

2020-10-16 Thread GitBox


SparkQA removed a comment on pull request #30057:
URL: https://github.com/apache/spark/pull/30057#issuecomment-709775887


   **[Test build #129875 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129875/testReport)**
 for PR 30057 at commit 
[`199aa8f`](https://github.com/apache/spark/commit/199aa8f01673ba0b990567516771106dd15ff143).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30057:
URL: https://github.com/apache/spark/pull/30057#issuecomment-709848373


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129875/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job

2020-10-16 Thread GitBox


SparkQA commented on pull request #30066:
URL: https://github.com/apache/spark/pull/30066#issuecomment-709848582


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34483/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30057:
URL: https://github.com/apache/spark/pull/30057#issuecomment-709848361


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path

2020-10-16 Thread GitBox


AmplabJenkins commented on pull request #30057:
URL: https://github.com/apache/spark/pull/30057#issuecomment-709848361







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path

2020-10-16 Thread GitBox


SparkQA commented on pull request #30057:
URL: https://github.com/apache/spark/pull/30057#issuecomment-709848218


   **[Test build #129875 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129875/testReport)**
 for PR 30057 at commit 
[`199aa8f`](https://github.com/apache/spark/commit/199aa8f01673ba0b990567516771106dd15ff143).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30065: [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30065:
URL: https://github.com/apache/spark/pull/30065#issuecomment-709847660







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30065: [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead

2020-10-16 Thread GitBox


AmplabJenkins commented on pull request #30065:
URL: https://github.com/apache/spark/pull/30065#issuecomment-709847660







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709847313







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30065: [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead

2020-10-16 Thread GitBox


SparkQA removed a comment on pull request #30065:
URL: https://github.com/apache/spark/pull/30065#issuecomment-709699989


   **[Test build #129870 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129870/testReport)**
 for PR 30065 at commit 
[`6971fdf`](https://github.com/apache/spark/commit/6971fdfd77553e01b69cd8cf866508a8ec923941).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

2020-10-16 Thread GitBox


AmplabJenkins commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709847313







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30065: [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead

2020-10-16 Thread GitBox


SparkQA commented on pull request #30065:
URL: https://github.com/apache/spark/pull/30065#issuecomment-709846092


   **[Test build #129870 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129870/testReport)**
 for PR 30065 at commit 
[`6971fdf`](https://github.com/apache/spark/commit/6971fdfd77553e01b69cd8cf866508a8ec923941).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

2020-10-16 Thread GitBox


SparkQA removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709700011


   **[Test build #129871 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129871/testReport)**
 for PR 30046 at commit 
[`b50eea8`](https://github.com/apache/spark/commit/b50eea895a084c04784399faaf74f2b822405e84).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

2020-10-16 Thread GitBox


SparkQA commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709845677


   **[Test build #129871 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129871/testReport)**
 for PR 30046 at commit 
[`b50eea8`](https://github.com/apache/spark/commit/b50eea895a084c04784399faaf74f2b822405e84).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709844264







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

2020-10-16 Thread GitBox


AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709844264







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

2020-10-16 Thread GitBox


SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709844239


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34482/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30042: [SPARK-33139][SQL] protect setActionSession and clearActiveSession

2020-10-16 Thread GitBox


HyukjinKwon commented on a change in pull request #30042:
URL: https://github.com/apache/spark/pull/30042#discussion_r506087792



##
File path: python/pyspark/sql/session.py
##
@@ -230,7 +230,10 @@ def __init__(self, sparkContext, jsparkSession=None):
 SparkSession._instantiatedSession = self
 SparkSession._activeSession = self
 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
-self._jvm.SparkSession.setActiveSession(self._jsparkSession)
+
self._jvm.java.lang.Class.forName("org.apache.spark.sql.SparkSession$")\
+.getDeclaredField("MODULE$")\
+.get(None)\
+.setActiveSessionInternal(self._jsparkSession)

Review comment:
   Thanks, please go ahead for a followup.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709842833


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129877/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

2020-10-16 Thread GitBox


SparkQA removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709798588


   **[Test build #129877 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129877/testReport)**
 for PR 28938 at commit 
[`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709842821


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

2020-10-16 Thread GitBox


AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709842821







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

2020-10-16 Thread GitBox


SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709842475


   **[Test build #129877 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129877/testReport)**
 for PR 28938 at commit 
[`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file

2020-10-16 Thread GitBox


viirya commented on a change in pull request #26312:
URL: https://github.com/apache/spark/pull/26312#discussion_r506086784



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##
@@ -281,6 +281,10 @@ object FileFormatWriter extends Logging {
 } catch {
   case e: FetchFailedException =>
 throw e
+  case f: FileAlreadyExistsException =>

Review comment:
   I see. Thanks for the details. We have different standpoints. For your 
cases the first one option looks a better choice. The customers we had are 
using HDFS and `FileAlreadyExistsException` isn't recoverable. So the pain 
point comes from more time spent on a failed job.
   
   I believe even SPARK-27194 is resolved, fast-fail of a failed job caused by 
`FileAlreadyExistsException` or maybe other errors if we know they are 
un-recoverable in advance, is still useful.
   
   Seems to me there are options, one is to revert this completely, second is 
to add a config for the fast-fail behavior and set it false by default. I 
prefer the second one because the reason above, we can relieve the pain of 
wasting time on failed job if users want.
   
   WDYT?
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30042: [SPARK-33139][SQL] protect setActionSession and clearActiveSession

2020-10-16 Thread GitBox


HyukjinKwon commented on a change in pull request #30042:
URL: https://github.com/apache/spark/pull/30042#discussion_r506086362



##
File path: python/pyspark/sql/session.py
##
@@ -230,7 +230,10 @@ def __init__(self, sparkContext, jsparkSession=None):
 SparkSession._instantiatedSession = self
 SparkSession._activeSession = self
 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
-self._jvm.SparkSession.setActiveSession(self._jsparkSession)
+
self._jvm.java.lang.Class.forName("org.apache.spark.sql.SparkSession$")\

Review comment:
   `Class.forName` should better not directly used. This is banned by Scala 
style:
   
   
https://github.com/apache/spark/blob/e93b8f02cd706bedc47c9b55a73f632fe9e61ec3/scalastyle-config.xml#L197-L206





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken commented on a change in pull request #30042: [SPARK-33139][SQL] protect setActionSession and clearActiveSession

2020-10-16 Thread GitBox


leanken commented on a change in pull request #30042:
URL: https://github.com/apache/spark/pull/30042#discussion_r506086345



##
File path: python/pyspark/sql/session.py
##
@@ -230,7 +230,10 @@ def __init__(self, sparkContext, jsparkSession=None):
 SparkSession._instantiatedSession = self
 SparkSession._activeSession = self
 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
-self._jvm.SparkSession.setActiveSession(self._jsparkSession)
+
self._jvm.java.lang.Class.forName("org.apache.spark.sql.SparkSession$")\
+.getDeclaredField("MODULE$")\
+.get(None)\
+.setActiveSessionInternal(self._jsparkSession)

Review comment:
   OK, I will test and update in next PR, thanks @HyukjinKwon 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30042: [SPARK-33139][SQL] protect setActionSession and clearActiveSession

2020-10-16 Thread GitBox


HyukjinKwon commented on a change in pull request #30042:
URL: https://github.com/apache/spark/pull/30042#discussion_r506085277



##
File path: python/pyspark/sql/session.py
##
@@ -230,7 +230,10 @@ def __init__(self, sparkContext, jsparkSession=None):
 SparkSession._instantiatedSession = self
 SparkSession._activeSession = self
 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
-self._jvm.SparkSession.setActiveSession(self._jsparkSession)
+
self._jvm.java.lang.Class.forName("org.apache.spark.sql.SparkSession$")\
+.getDeclaredField("MODULE$")\
+.get(None)\
+.setActiveSessionInternal(self._jsparkSession)

Review comment:
   Hey, you don't need to manually reflect here. package level private 
accessor is already accessible in Java as you did so you can just mimic it here 
via `getattr(getattr(spark._jvm, "SparkSession$"), 
"MODULE$").setActiveSessionInternal`(...).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] moomindani commented on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-10-16 Thread GitBox


moomindani commented on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-709838989


   @gatorsmile Just a reminder.. Can you take a look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

2020-10-16 Thread GitBox


AmplabJenkins removed a comment on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-70983







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

2020-10-16 Thread GitBox


AmplabJenkins commented on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-70983







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

2020-10-16 Thread GitBox


SparkQA removed a comment on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-709656628


   **[Test build #129862 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129862/testReport)**
 for PR 30001 at commit 
[`0ebceb0`](https://github.com/apache/spark/commit/0ebceb01d1bbd30345f4d0a3662f34a51bc965d7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

2020-10-16 Thread GitBox


SparkQA commented on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-709834570


   **[Test build #129862 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129862/testReport)**
 for PR 30001 at commit 
[`0ebceb0`](https://github.com/apache/spark/commit/0ebceb01d1bbd30345f4d0a3662f34a51bc965d7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

2020-10-16 Thread GitBox


SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709831123


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34482/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    3   4   5   6   7   8   9   >