[GitHub] [spark] SparkQA commented on pull request #30045: [SPARK-32991][SQL] Use conf in shared state as the original configuraion for RESET
SparkQA commented on pull request #30045: URL: https://github.com/apache/spark/pull/30045#issuecomment-709909565 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34488/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #30009: [SPARK-32907][ML] adaptively blockify instances - LinearSVC
zhengruifeng commented on a change in pull request #30009: URL: https://github.com/apache/spark/pull/30009#discussion_r506167739 ## File path: mllib/src/main/scala/org/apache/spark/ml/feature/Instance.scala ## @@ -114,6 +133,62 @@ private[spark] object InstanceBlock { def blokify(instances: RDD[Instance], blockSize: Int): RDD[InstanceBlock] = { instances.mapPartitions(_.grouped(blockSize).map(InstanceBlock.fromInstances)) } + + def blokifyWithMaxMemUsage( + iterator: Iterator[Instance], + maxMemUsage: Long): Iterator[InstanceBlock] = { +require(maxMemUsage > 0) + +new Iterator[InstanceBlock] { + private var numCols = -1L + private val buff = mutable.ArrayBuilder.make[Instance] + + override def hasNext: Boolean = iterator.hasNext + + override def next(): InstanceBlock = { +buff.clear() +var buffCnt = 0L +var buffNnz = 0L +var buffUnitWeight = true +var blockMemUsage = 0L + +while (iterator.hasNext && blockMemUsage < maxMemUsage) { + val instance = iterator.next() + if (numCols < 0L) numCols = instance.features.size + require(numCols == instance.features.size) + val nnz = instance.features.numNonzeros + + buff += instance + buffCnt += 1L + buffNnz += nnz + buffUnitWeight &&= (instance.weight == 1) + blockMemUsage = getBlockMemUsage(numCols, buffCnt, buffNnz, buffUnitWeight) +} + +// the block mem usage may slightly exceed threshold, not a big issue. +// and this ensure even if one row exceed block limit, each block has one row +InstanceBlock.fromInstances(buff.result()) + } +} + } + + def blokifyWithMaxMemUsage( + instances: RDD[Instance], + maxMemUsage: Long): RDD[InstanceBlock] = { +require(maxMemUsage > 0) +instances.mapPartitions(iter => blokifyWithMaxMemUsage(iter, maxMemUsage)) + } + + def inferBlockSizeInMB( + dim: Int, + avgNNZ: Double, + blasLevel: Int = 2): Double = { +if (dim <= avgNNZ * 3) { + 0.25 +} else { + 64.0 +} Review comment: Current strategy is quitely simple, I think we may use a complex costmodel if necessay in the future. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gemelen commented on pull request #29995: [SPARK-33080][BUILD] Replace fatal warnings snippet
gemelen commented on pull request #29995: URL: https://github.com/apache/spark/pull/29995#issuecomment-709904229 @srowen thanks a lot for your efforts to pass this changeset This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30009: [SPARK-32907][ML] adaptively blockify instances - LinearSVC
SparkQA commented on pull request #30009: URL: https://github.com/apache/spark/pull/30009#issuecomment-709903604 **[Test build #129886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129886/testReport)** for PR 30009 at commit [`c0a734d`](https://github.com/apache/spark/commit/c0a734de5e4d4df819caa4f86634242966d5786b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog
SparkQA commented on pull request #28938: URL: https://github.com/apache/spark/pull/28938#issuecomment-709903663 **[Test build #129887 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129887/testReport)** for PR 28938 at commit [`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30045: [SPARK-32991][SQL] Use conf in shared state as the original configuraion for RESET
SparkQA commented on pull request #30045: URL: https://github.com/apache/spark/pull/30045#issuecomment-709900599 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34488/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct
SparkQA commented on pull request #30026: URL: https://github.com/apache/spark/pull/30026#issuecomment-709899037 **[Test build #129885 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129885/testReport)** for PR 30026 at commit [`5769222`](https://github.com/apache/spark/commit/5769be0ec45243c9fc574dd6ff06c87f9024). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog
LantaoJin commented on pull request #28938: URL: https://github.com/apache/spark/pull/28938#issuecomment-709898787 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns
SparkQA commented on pull request #30053: URL: https://github.com/apache/spark/pull/30053#issuecomment-709897406 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34487/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns
AmplabJenkins removed a comment on pull request #30053: URL: https://github.com/apache/spark/pull/30053#issuecomment-709895941 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34486/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns
AmplabJenkins commented on pull request #30053: URL: https://github.com/apache/spark/pull/30053#issuecomment-709895923 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns
AmplabJenkins removed a comment on pull request #30053: URL: https://github.com/apache/spark/pull/30053#issuecomment-709895923 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns
SparkQA commented on pull request #30053: URL: https://github.com/apache/spark/pull/30053#issuecomment-709895901 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34486/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a change in pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct
LuciferYang commented on a change in pull request #30026: URL: https://github.com/apache/spark/pull/30026#discussion_r506151340 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala ## @@ -139,20 +142,22 @@ class BasicWriteJobStatsTracker( override def processStats(stats: Seq[WriteTaskStats]): Unit = { val sparkContext = SparkContext.getActive.get -var numPartitions: Long = 0L +var partitionsSet: mutable.Set[InternalRow] = mutable.HashSet.empty var numFiles: Long = 0L var totalNumBytes: Long = 0L var totalNumOutput: Long = 0L val basicStats = stats.map(_.asInstanceOf[BasicWriteTaskStats]) basicStats.foreach { summary => - numPartitions += summary.numPartitions + partitionsSet ++= summary.partitions numFiles += summary.numFiles totalNumBytes += summary.numBytes totalNumOutput += summary.numRows } +val numPartitions: Long = partitionsSet.size Review comment: Address 5769222 fix this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a change in pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct
LuciferYang commented on a change in pull request #30026: URL: https://github.com/apache/spark/pull/30026#discussion_r506148242 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala ## @@ -139,20 +142,22 @@ class BasicWriteJobStatsTracker( override def processStats(stats: Seq[WriteTaskStats]): Unit = { val sparkContext = SparkContext.getActive.get -var numPartitions: Long = 0L +var partitionsSet: mutable.Set[InternalRow] = mutable.HashSet.empty var numFiles: Long = 0L var totalNumBytes: Long = 0L var totalNumOutput: Long = 0L val basicStats = stats.map(_.asInstanceOf[BasicWriteTaskStats]) basicStats.foreach { summary => - numPartitions += summary.numPartitions + partitionsSet ++= summary.partitions Review comment: ditto, `partitionsSet.addAll(summary.partitions)` can only be used in Scala 2.13 too. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala ## @@ -76,7 +79,7 @@ class BasicWriteTaskStatsTracker(hadoopConf: Configuration) override def newPartition(partitionValues: InternalRow): Unit = { -numPartitions += 1 +partitions = partitions :+ partitionValues Review comment: `partitions.appended(partitionValues)` can only be used in Scala 2.13 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a change in pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct
LuciferYang commented on a change in pull request #30026: URL: https://github.com/apache/spark/pull/30026#discussion_r506147287 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala ## @@ -76,7 +79,7 @@ class BasicWriteTaskStatsTracker(hadoopConf: Configuration) override def newPartition(partitionValues: InternalRow): Unit = { -numPartitions += 1 +partitions = partitions :+ partitionValues Review comment: `partitions.appended(partitionValues)` need Scala 2.13 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)
SparkQA commented on pull request #30025: URL: https://github.com/apache/spark/pull/30025#issuecomment-709892119 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34485/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)
AmplabJenkins commented on pull request #30025: URL: https://github.com/apache/spark/pull/30025#issuecomment-709892138 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)
AmplabJenkins removed a comment on pull request #30025: URL: https://github.com/apache/spark/pull/30025#issuecomment-709892138 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
SparkQA commented on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-709890593 **[Test build #129884 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129884/testReport)** for PR 2 at commit [`f657ff0`](https://github.com/apache/spark/commit/f657ff0372f1cac48ea008a08c1cc7011f934d98). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
beliefer commented on a change in pull request #2: URL: https://github.com/apache/spark/pull/2#discussion_r506142456 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -176,6 +177,125 @@ case class Like(left: Expression, right: Expression, escapeChar: Char) } } +abstract class LikeAllBase extends Expression with ImplicitCastInputTypes with NullIntolerant { + def value: Expression = children.head + def list: Seq[Expression] = children.tail + def isNot: Boolean + + override def inputTypes: Seq[AbstractDataType] = { +StringType +: Seq.fill(children.size - 1)(StringType) + } + + override def dataType: DataType = BooleanType + + override def foldable: Boolean = children.forall(_.foldable) + + override def nullable: Boolean = true + + def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches() + + override def eval(input: InternalRow): Any = { +val evaluatedValue = value.eval(input) +if (evaluatedValue == null) { + null +} else { + var hasNull = false + var match = true + list.foreach { e => +val str = e.eval(input) +if (str == null) { + hasNull = true +} else { + val regex = + Pattern.compile(StringUtils.escapeLikeRegex(str.asInstanceOf[UTF8String].toString, '\\')) + if ((isNot && matches(regex, evaluatedValue.asInstanceOf[UTF8String].toString)) || +!(isNot || matches(regex, evaluatedValue.asInstanceOf[UTF8String].toString)) { +match = false + } +} + } + if (hasNull) { +null + } else { +match + } +} + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val patternClass = classOf[Pattern].getName +val escapeFunc = StringUtils.getClass.getName.stripSuffix("$") + ".escapeLikeRegex" +val javaDataType = CodeGenerator.javaType(value.dataType) +val valueGen = value.genCode(ctx) +val listGen = list.map(_.genCode(ctx)) +val pattern = ctx.freshName("pattern") +val rightStr = ctx.freshName("rightStr") +val escapedEscapeChar = StringEscapeUtils.escapeJava("\\") +val hasNull = ctx.freshName("hasNull") +val matched = ctx.freshName("matched") +val valueArg = ctx.freshName("valueArg") +val listCode = listGen.map(x => + s""" + |${x.code} + |if (${x.isNull}) { + | $hasNull = true; // ${ev.isNull} = true; + |} else if (!$hasNull && $matched) { + | String $rightStr = ${x.value}.toString(); + | $patternClass $pattern = + |$patternClass.compile($escapeFunc($rightStr, '$escapedEscapeChar')); Review comment: OK. I will cache the pattern of foldable regex string. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns
SparkQA commented on pull request #30053: URL: https://github.com/apache/spark/pull/30053#issuecomment-709884964 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34486/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)
SparkQA commented on pull request #30025: URL: https://github.com/apache/spark/pull/30025#issuecomment-709883799 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34485/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30045: [SPARK-32991][SQL] Use conf in shared state as the original configuraion for RESET
SparkQA commented on pull request #30045: URL: https://github.com/apache/spark/pull/30045#issuecomment-709878764 **[Test build #129883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129883/testReport)** for PR 30045 at commit [`6848b2f`](https://github.com/apache/spark/commit/6848b2fed2be7137f4133bb7ec1790b9aad1ba29). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #30045: [SPARK-32991][SQL] Use conf in shared state as the original configuraion for RESET
yaooqinn commented on pull request #30045: URL: https://github.com/apache/spark/pull/30045#issuecomment-709878145 cc @hvanhovell too This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job
AmplabJenkins removed a comment on pull request #30066: URL: https://github.com/apache/spark/pull/30066#issuecomment-709877835 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job
AmplabJenkins commented on pull request #30066: URL: https://github.com/apache/spark/pull/30066#issuecomment-709877835 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job
SparkQA commented on pull request #30066: URL: https://github.com/apache/spark/pull/30066#issuecomment-709877814 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34484/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns
SparkQA commented on pull request #30053: URL: https://github.com/apache/spark/pull/30053#issuecomment-709874931 **[Test build #129882 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129882/testReport)** for PR 30053 at commit [`96a0706`](https://github.com/apache/spark/commit/96a070601d813baf8749c274069777ca4fe89fd6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct
cloud-fan commented on a change in pull request #30026: URL: https://github.com/apache/spark/pull/30026#discussion_r506112571 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala ## @@ -139,20 +142,22 @@ class BasicWriteJobStatsTracker( override def processStats(stats: Seq[WriteTaskStats]): Unit = { val sparkContext = SparkContext.getActive.get -var numPartitions: Long = 0L +var partitionsSet: mutable.Set[InternalRow] = mutable.HashSet.empty var numFiles: Long = 0L var totalNumBytes: Long = 0L var totalNumOutput: Long = 0L val basicStats = stats.map(_.asInstanceOf[BasicWriteTaskStats]) basicStats.foreach { summary => - numPartitions += summary.numPartitions + partitionsSet ++= summary.partitions numFiles += summary.numFiles totalNumBytes += summary.numBytes totalNumOutput += summary.numRows } +val numPartitions: Long = partitionsSet.size Review comment: nit: it's only used once, we can inline it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns
SparkQA commented on pull request #30053: URL: https://github.com/apache/spark/pull/30053#issuecomment-709870468 **[Test build #129881 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129881/testReport)** for PR 30053 at commit [`2634588`](https://github.com/apache/spark/commit/2634588874042dd20c3293e4c67a7ae0199fe5b9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct
cloud-fan commented on a change in pull request #30026: URL: https://github.com/apache/spark/pull/30026#discussion_r506112213 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala ## @@ -139,20 +142,22 @@ class BasicWriteJobStatsTracker( override def processStats(stats: Seq[WriteTaskStats]): Unit = { val sparkContext = SparkContext.getActive.get -var numPartitions: Long = 0L +var partitionsSet: mutable.Set[InternalRow] = mutable.HashSet.empty var numFiles: Long = 0L var totalNumBytes: Long = 0L var totalNumOutput: Long = 0L val basicStats = stats.map(_.asInstanceOf[BasicWriteTaskStats]) basicStats.foreach { summary => - numPartitions += summary.numPartitions + partitionsSet ++= summary.partitions Review comment: ditto, `partitionsSet.addAll(summary.partitions)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct
cloud-fan commented on a change in pull request #30026: URL: https://github.com/apache/spark/pull/30026#discussion_r506111562 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala ## @@ -76,7 +79,7 @@ class BasicWriteTaskStatsTracker(hadoopConf: Configuration) override def newPartition(partitionValues: InternalRow): Unit = { -numPartitions += 1 +partitions = partitions :+ partitionValues Review comment: this looks like appending a immutable collection. Can we be more explicit? e.g. `partitions.append(partitionValues)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job
SparkQA commented on pull request #30066: URL: https://github.com/apache/spark/pull/30066#issuecomment-709868917 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34484/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct
cloud-fan commented on a change in pull request #30026: URL: https://github.com/apache/spark/pull/30026#discussion_r506110093 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala ## @@ -30,12 +32,13 @@ import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics} import org.apache.spark.util.SerializableConfiguration + Review comment: unnecessary change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job
AmplabJenkins commented on pull request #30066: URL: https://github.com/apache/spark/pull/30066#issuecomment-709867933 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job
SparkQA commented on pull request #30066: URL: https://github.com/apache/spark/pull/30066#issuecomment-709867892 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34483/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct
cloud-fan commented on pull request #30026: URL: https://github.com/apache/spark/pull/30026#issuecomment-709868109 > return size is partition num * shuffle num always can be millions level I thought about it. If a table has 10k partitions, it's unlikely that each write task touches all the 10k partitions. So the total size is not that large. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job
AmplabJenkins removed a comment on pull request #30066: URL: https://github.com/apache/spark/pull/30066#issuecomment-709867933 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30059: [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs
AmplabJenkins removed a comment on pull request #30059: URL: https://github.com/apache/spark/pull/30059#issuecomment-709865465 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129864/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job
AmplabJenkins removed a comment on pull request #30066: URL: https://github.com/apache/spark/pull/30066#issuecomment-709864234 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30059: [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs
SparkQA commented on pull request #30059: URL: https://github.com/apache/spark/pull/30059#issuecomment-709864009 **[Test build #129864 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129864/testReport)** for PR 30059 at commit [`f39ac87`](https://github.com/apache/spark/commit/f39ac871fc38e8ec8c02b7f6661748e2c7d431e9). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
AmplabJenkins removed a comment on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-709864803 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129872/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job
AmplabJenkins removed a comment on pull request #30066: URL: https://github.com/apache/spark/pull/30066#issuecomment-709864246 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129879/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30059: [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs
SparkQA removed a comment on pull request #30059: URL: https://github.com/apache/spark/pull/30059#issuecomment-709664186 **[Test build #129864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129864/testReport)** for PR 30059 at commit [`f39ac87`](https://github.com/apache/spark/commit/f39ac871fc38e8ec8c02b7f6661748e2c7d431e9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
SparkQA commented on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-709863988 **[Test build #129872 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129872/testReport)** for PR 2 at commit [`be5eb8a`](https://github.com/apache/spark/commit/be5eb8a1f092e15c941d39d517284aed67de72c9). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
SparkQA removed a comment on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-709702627 **[Test build #129872 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129872/testReport)** for PR 2 at commit [`be5eb8a`](https://github.com/apache/spark/commit/be5eb8a1f092e15c941d39d517284aed67de72c9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] linhongliu-db commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns
linhongliu-db commented on pull request #30053: URL: https://github.com/apache/spark/pull/30053#issuecomment-709865036 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job
SparkQA commented on pull request #30066: URL: https://github.com/apache/spark/pull/30066#issuecomment-709864008 **[Test build #129879 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129879/testReport)** for PR 30066 at commit [`32ec11a`](https://github.com/apache/spark/commit/32ec11ac3866a88ee6628b22c4379e27ec9b212b). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30059: [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs
AmplabJenkins removed a comment on pull request #30059: URL: https://github.com/apache/spark/pull/30059#issuecomment-709865453 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
AmplabJenkins removed a comment on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-709864778 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)
AmplabJenkins removed a comment on pull request #30025: URL: https://github.com/apache/spark/pull/30025#issuecomment-709864026 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30059: [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs
AmplabJenkins commented on pull request #30059: URL: https://github.com/apache/spark/pull/30059#issuecomment-709865453 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)
SparkQA removed a comment on pull request #30025: URL: https://github.com/apache/spark/pull/30025#issuecomment-709859600 **[Test build #129880 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129880/testReport)** for PR 30025 at commit [`dfd6d4b`](https://github.com/apache/spark/commit/dfd6d4b5ee2bfad370ec57e264b1c18de038e8ae). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)
AmplabJenkins removed a comment on pull request #30025: URL: https://github.com/apache/spark/pull/30025#issuecomment-709864035 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129880/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)
cloud-fan commented on a change in pull request #30025: URL: https://github.com/apache/spark/pull/30025#discussion_r506106800 ## File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala ## @@ -48,4 +48,26 @@ private case object MySQLDialect extends JdbcDialect { } override def isCascadingTruncateTable(): Option[Boolean] = Some(false) + + // See https://dev.mysql.com/doc/refman/8.0/en/alter-table.html + override def getUpdateColumnTypeQuery( + tableName: String, + columnName: String, + newDataType: String): String = { +s"ALTER TABLE $tableName MODIFY COLUMN ${quoteIdentifier(columnName)} $newDataType" + } + + // See https://dev.mysql.com/doc/refman/8.0/en/alter-table.html + // require to have column data type to change the column nullability + // ALTER TABLE tbl_name MODIFY [COLUMN] col_name column_definition + // column_definition: + //data_type [NOT NULL | NULL] + // e.g. ALTER TABLE t1 MODIFY b INT NOT NULL; Review comment: Spark knows the table schema and data type info is available. We need to pass the column type info to here though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job
AmplabJenkins commented on pull request #30066: URL: https://github.com/apache/spark/pull/30066#issuecomment-709864234 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job
SparkQA removed a comment on pull request #30066: URL: https://github.com/apache/spark/pull/30066#issuecomment-709826033 **[Test build #129879 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129879/testReport)** for PR 30066 at commit [`32ec11a`](https://github.com/apache/spark/commit/32ec11ac3866a88ee6628b22c4379e27ec9b212b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
AmplabJenkins commented on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-709864778 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)
SparkQA commented on pull request #30025: URL: https://github.com/apache/spark/pull/30025#issuecomment-709863989 **[Test build #129880 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129880/testReport)** for PR 30025 at commit [`dfd6d4b`](https://github.com/apache/spark/commit/dfd6d4b5ee2bfad370ec57e264b1c18de038e8ae). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)
AmplabJenkins commented on pull request #30025: URL: https://github.com/apache/spark/pull/30025#issuecomment-709864026 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns
AmplabJenkins removed a comment on pull request #30053: URL: https://github.com/apache/spark/pull/30053#issuecomment-709861968 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129876/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns
AmplabJenkins removed a comment on pull request #30053: URL: https://github.com/apache/spark/pull/30053#issuecomment-709861955 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns
AmplabJenkins commented on pull request #30053: URL: https://github.com/apache/spark/pull/30053#issuecomment-709861955 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns
SparkQA removed a comment on pull request #30053: URL: https://github.com/apache/spark/pull/30053#issuecomment-709781462 **[Test build #129876 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129876/testReport)** for PR 30053 at commit [`2634588`](https://github.com/apache/spark/commit/2634588874042dd20c3293e4c67a7ae0199fe5b9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30053: [SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns
SparkQA commented on pull request #30053: URL: https://github.com/apache/spark/pull/30053#issuecomment-709861628 **[Test build #129876 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129876/testReport)** for PR 30053 at commit [`2634588`](https://github.com/apache/spark/commit/2634588874042dd20c3293e4c67a7ae0199fe5b9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30025: [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)
SparkQA commented on pull request #30025: URL: https://github.com/apache/spark/pull/30025#issuecomment-709859600 **[Test build #129880 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129880/testReport)** for PR 30025 at commit [`dfd6d4b`](https://github.com/apache/spark/commit/dfd6d4b5ee2bfad370ec57e264b1c18de038e8ae). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
SparkQA removed a comment on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-709775887 **[Test build #129875 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129875/testReport)** for PR 30057 at commit [`199aa8f`](https://github.com/apache/spark/commit/199aa8f01673ba0b990567516771106dd15ff143). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
AmplabJenkins removed a comment on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-709848373 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129875/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30066: [SPARK-XXX][INFRA] Use pre-built image at GitHub Action SparkR job
SparkQA commented on pull request #30066: URL: https://github.com/apache/spark/pull/30066#issuecomment-709848582 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34483/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
AmplabJenkins removed a comment on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-709848361 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
AmplabJenkins commented on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-709848361 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
SparkQA commented on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-709848218 **[Test build #129875 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129875/testReport)** for PR 30057 at commit [`199aa8f`](https://github.com/apache/spark/commit/199aa8f01673ba0b990567516771106dd15ff143). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30065: [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead
AmplabJenkins removed a comment on pull request #30065: URL: https://github.com/apache/spark/pull/30065#issuecomment-709847660 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30065: [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead
AmplabJenkins commented on pull request #30065: URL: https://github.com/apache/spark/pull/30065#issuecomment-709847660 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration
AmplabJenkins removed a comment on pull request #30046: URL: https://github.com/apache/spark/pull/30046#issuecomment-709847313 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30065: [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead
SparkQA removed a comment on pull request #30065: URL: https://github.com/apache/spark/pull/30065#issuecomment-709699989 **[Test build #129870 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129870/testReport)** for PR 30065 at commit [`6971fdf`](https://github.com/apache/spark/commit/6971fdfd77553e01b69cd8cf866508a8ec923941). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration
AmplabJenkins commented on pull request #30046: URL: https://github.com/apache/spark/pull/30046#issuecomment-709847313 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30065: [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead
SparkQA commented on pull request #30065: URL: https://github.com/apache/spark/pull/30065#issuecomment-709846092 **[Test build #129870 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129870/testReport)** for PR 30065 at commit [`6971fdf`](https://github.com/apache/spark/commit/6971fdfd77553e01b69cd8cf866508a8ec923941). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration
SparkQA removed a comment on pull request #30046: URL: https://github.com/apache/spark/pull/30046#issuecomment-709700011 **[Test build #129871 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129871/testReport)** for PR 30046 at commit [`b50eea8`](https://github.com/apache/spark/commit/b50eea895a084c04784399faaf74f2b822405e84). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration
SparkQA commented on pull request #30046: URL: https://github.com/apache/spark/pull/30046#issuecomment-709845677 **[Test build #129871 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129871/testReport)** for PR 30046 at commit [`b50eea8`](https://github.com/apache/spark/commit/b50eea895a084c04784399faaf74f2b822405e84). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog
AmplabJenkins removed a comment on pull request #28938: URL: https://github.com/apache/spark/pull/28938#issuecomment-709844264 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog
AmplabJenkins commented on pull request #28938: URL: https://github.com/apache/spark/pull/28938#issuecomment-709844264 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog
SparkQA commented on pull request #28938: URL: https://github.com/apache/spark/pull/28938#issuecomment-709844239 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34482/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30042: [SPARK-33139][SQL] protect setActionSession and clearActiveSession
HyukjinKwon commented on a change in pull request #30042: URL: https://github.com/apache/spark/pull/30042#discussion_r506087792 ## File path: python/pyspark/sql/session.py ## @@ -230,7 +230,10 @@ def __init__(self, sparkContext, jsparkSession=None): SparkSession._instantiatedSession = self SparkSession._activeSession = self self._jvm.SparkSession.setDefaultSession(self._jsparkSession) -self._jvm.SparkSession.setActiveSession(self._jsparkSession) + self._jvm.java.lang.Class.forName("org.apache.spark.sql.SparkSession$")\ +.getDeclaredField("MODULE$")\ +.get(None)\ +.setActiveSessionInternal(self._jsparkSession) Review comment: Thanks, please go ahead for a followup. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog
AmplabJenkins removed a comment on pull request #28938: URL: https://github.com/apache/spark/pull/28938#issuecomment-709842833 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129877/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog
SparkQA removed a comment on pull request #28938: URL: https://github.com/apache/spark/pull/28938#issuecomment-709798588 **[Test build #129877 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129877/testReport)** for PR 28938 at commit [`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog
AmplabJenkins removed a comment on pull request #28938: URL: https://github.com/apache/spark/pull/28938#issuecomment-709842821 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog
AmplabJenkins commented on pull request #28938: URL: https://github.com/apache/spark/pull/28938#issuecomment-709842821 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog
SparkQA commented on pull request #28938: URL: https://github.com/apache/spark/pull/28938#issuecomment-709842475 **[Test build #129877 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129877/testReport)** for PR 28938 at commit [`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file
viirya commented on a change in pull request #26312: URL: https://github.com/apache/spark/pull/26312#discussion_r506086784 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ## @@ -281,6 +281,10 @@ object FileFormatWriter extends Logging { } catch { case e: FetchFailedException => throw e + case f: FileAlreadyExistsException => Review comment: I see. Thanks for the details. We have different standpoints. For your cases the first one option looks a better choice. The customers we had are using HDFS and `FileAlreadyExistsException` isn't recoverable. So the pain point comes from more time spent on a failed job. I believe even SPARK-27194 is resolved, fast-fail of a failed job caused by `FileAlreadyExistsException` or maybe other errors if we know they are un-recoverable in advance, is still useful. Seems to me there are options, one is to revert this completely, second is to add a config for the fast-fail behavior and set it false by default. I prefer the second one because the reason above, we can relieve the pain of wasting time on failed job if users want. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30042: [SPARK-33139][SQL] protect setActionSession and clearActiveSession
HyukjinKwon commented on a change in pull request #30042: URL: https://github.com/apache/spark/pull/30042#discussion_r506086362 ## File path: python/pyspark/sql/session.py ## @@ -230,7 +230,10 @@ def __init__(self, sparkContext, jsparkSession=None): SparkSession._instantiatedSession = self SparkSession._activeSession = self self._jvm.SparkSession.setDefaultSession(self._jsparkSession) -self._jvm.SparkSession.setActiveSession(self._jsparkSession) + self._jvm.java.lang.Class.forName("org.apache.spark.sql.SparkSession$")\ Review comment: `Class.forName` should better not directly used. This is banned by Scala style: https://github.com/apache/spark/blob/e93b8f02cd706bedc47c9b55a73f632fe9e61ec3/scalastyle-config.xml#L197-L206 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #30042: [SPARK-33139][SQL] protect setActionSession and clearActiveSession
leanken commented on a change in pull request #30042: URL: https://github.com/apache/spark/pull/30042#discussion_r506086345 ## File path: python/pyspark/sql/session.py ## @@ -230,7 +230,10 @@ def __init__(self, sparkContext, jsparkSession=None): SparkSession._instantiatedSession = self SparkSession._activeSession = self self._jvm.SparkSession.setDefaultSession(self._jsparkSession) -self._jvm.SparkSession.setActiveSession(self._jsparkSession) + self._jvm.java.lang.Class.forName("org.apache.spark.sql.SparkSession$")\ +.getDeclaredField("MODULE$")\ +.get(None)\ +.setActiveSessionInternal(self._jsparkSession) Review comment: OK, I will test and update in next PR, thanks @HyukjinKwon This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30042: [SPARK-33139][SQL] protect setActionSession and clearActiveSession
HyukjinKwon commented on a change in pull request #30042: URL: https://github.com/apache/spark/pull/30042#discussion_r506085277 ## File path: python/pyspark/sql/session.py ## @@ -230,7 +230,10 @@ def __init__(self, sparkContext, jsparkSession=None): SparkSession._instantiatedSession = self SparkSession._activeSession = self self._jvm.SparkSession.setDefaultSession(self._jsparkSession) -self._jvm.SparkSession.setActiveSession(self._jsparkSession) + self._jvm.java.lang.Class.forName("org.apache.spark.sql.SparkSession$")\ +.getDeclaredField("MODULE$")\ +.get(None)\ +.setActiveSessionInternal(self._jsparkSession) Review comment: Hey, you don't need to manually reflect here. package level private accessor is already accessible in Java as you did so you can just mimic it here via `getattr(getattr(spark._jvm, "SparkSession$"), "MODULE$").setActiveSessionInternal`(...). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] moomindani commented on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC
moomindani commented on pull request #28953: URL: https://github.com/apache/spark/pull/28953#issuecomment-709838989 @gatorsmile Just a reminder.. Can you take a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates
AmplabJenkins removed a comment on pull request #30001: URL: https://github.com/apache/spark/pull/30001#issuecomment-70983 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates
AmplabJenkins commented on pull request #30001: URL: https://github.com/apache/spark/pull/30001#issuecomment-70983 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates
SparkQA removed a comment on pull request #30001: URL: https://github.com/apache/spark/pull/30001#issuecomment-709656628 **[Test build #129862 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129862/testReport)** for PR 30001 at commit [`0ebceb0`](https://github.com/apache/spark/commit/0ebceb01d1bbd30345f4d0a3662f34a51bc965d7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates
SparkQA commented on pull request #30001: URL: https://github.com/apache/spark/pull/30001#issuecomment-709834570 **[Test build #129862 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129862/testReport)** for PR 30001 at commit [`0ebceb0`](https://github.com/apache/spark/commit/0ebceb01d1bbd30345f4d0a3662f34a51bc965d7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog
SparkQA commented on pull request #28938: URL: https://github.com/apache/spark/pull/28938#issuecomment-709831123 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34482/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org