[GitHub] [spark] xianyinxin commented on a change in pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
xianyinxin commented on a change in pull request #28875: URL: https://github.com/apache/spark/pull/28875#discussion_r444002874 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -468,13 +458,25 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging throw new ParseException("There must be at least one WHEN clause in a MERGE statement", ctx) } // children being empty means that the condition is not set -if (matchedActions.length == 2 && matchedActions.head.children.isEmpty) { - throw new ParseException("When there are 2 MATCHED clauses in a MERGE statement, " + -"the first MATCHED clause must have a condition", ctx) -} -if (matchedActions.groupBy(_.getClass).mapValues(_.size).exists(_._2 > 1)) { +val matchedActionSize = matchedActions.length +if (matchedActionSize >= 2 && !matchedActions.init.forall(_.condition.nonEmpty)) { Review comment: I don't think so, because the `children` of `InsertAction` and `UpdateAction` actually include `condition` and `assignments`. There may be cases where there're `assignments` and `condition` being ignored but `children` is nonEmpty. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
AmplabJenkins removed a comment on pull request #28875: URL: https://github.com/apache/spark/pull/28875#issuecomment-647948835 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default
dongjoon-hyun commented on pull request #28897: URL: https://github.com/apache/spark/pull/28897#issuecomment-647949279 @gatorsmile . Why that blocks this? Technically, this supersedes it, doesn't it? > We should avoid making this change until we can resolve https://issues.apache.org/jira/browse/SPARK-32017 Switching the default is the real one. For example, we released Scala 2.12 in Spark 2.4.x lines for a while, but we didn't notice the Scala function issue until 3.0.0 release. Also, we can switch back to `Hadoop 2.7` before December if we want. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
AmplabJenkins commented on pull request #28875: URL: https://github.com/apache/spark/pull/28875#issuecomment-647948835 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
SparkQA commented on pull request #28875: URL: https://github.com/apache/spark/pull/28875#issuecomment-647948382 **[Test build #124396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124396/testReport)** for PR 28875 at commit [`ab97e31`](https://github.com/apache/spark/commit/ab97e31041091b4592f86349eaa81e379022b725). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager
cloud-fan commented on a change in pull request #28895: URL: https://github.com/apache/spark/pull/28895#discussion_r444000693 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -800,35 +770,20 @@ private[spark] class MapOutputTrackerWorker(conf: SparkConf) extends MapOutputTr // Get blocks sizes by executor Id. Note that zero-sized blocks are excluded in the result. override def getMapSizesByExecutorId( - shuffleId: Int, - startPartition: Int, - endPartition: Int) -: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { -logDebug(s"Fetching outputs for shuffle $shuffleId, partitions $startPartition-$endPartition") -val statuses = getStatuses(shuffleId, conf) -try { - MapOutputTracker.convertMapStatuses( -shuffleId, startPartition, endPartition, statuses, 0, statuses.length) -} catch { - case e: MetadataFetchFailedException => -// We experienced a fetch failure so our mapStatuses cache is outdated; clear it: -mapStatuses.clear() -throw e -} - } - - override def getMapSizesByRange( shuffleId: Int, startMapIndex: Int, endMapIndex: Int, startPartition: Int, - endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { -logDebug(s"Fetching outputs for shuffle $shuffleId, mappers $startMapIndex-$endMapIndex" + - s"partitions $startPartition-$endPartition") + endPartition: Int) +: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { Review comment: unnecessary change ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -800,35 +770,20 @@ private[spark] class MapOutputTrackerWorker(conf: SparkConf) extends MapOutputTr // Get blocks sizes by executor Id. Note that zero-sized blocks are excluded in the result. override def getMapSizesByExecutorId( - shuffleId: Int, - startPartition: Int, - endPartition: Int) -: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { -logDebug(s"Fetching outputs for shuffle $shuffleId, partitions $startPartition-$endPartition") -val statuses = getStatuses(shuffleId, conf) -try { - MapOutputTracker.convertMapStatuses( -shuffleId, startPartition, endPartition, statuses, 0, statuses.length) -} catch { - case e: MetadataFetchFailedException => -// We experienced a fetch failure so our mapStatuses cache is outdated; clear it: -mapStatuses.clear() -throw e -} - } - - override def getMapSizesByRange( shuffleId: Int, startMapIndex: Int, endMapIndex: Int, startPartition: Int, - endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { -logDebug(s"Fetching outputs for shuffle $shuffleId, mappers $startMapIndex-$endMapIndex" + - s"partitions $startPartition-$endPartition") + endPartition: Int) +: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { +logDebug(s"Fetching outputs for shuffle $shuffleId") val statuses = getStatuses(shuffleId, conf) try { + val endMapIndex0 = if (endMapIndex == Int.MaxValue) statuses.length else endMapIndex Review comment: ditto ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -800,35 +770,20 @@ private[spark] class MapOutputTrackerWorker(conf: SparkConf) extends MapOutputTr // Get blocks sizes by executor Id. Note that zero-sized blocks are excluded in the result. override def getMapSizesByExecutorId( - shuffleId: Int, - startPartition: Int, - endPartition: Int) -: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { -logDebug(s"Fetching outputs for shuffle $shuffleId, partitions $startPartition-$endPartition") -val statuses = getStatuses(shuffleId, conf) -try { - MapOutputTracker.convertMapStatuses( -shuffleId, startPartition, endPartition, statuses, 0, statuses.length) -} catch { - case e: MetadataFetchFailedException => -// We experienced a fetch failure so our mapStatuses cache is outdated; clear it: -mapStatuses.clear() -throw e -} - } - - override def getMapSizesByRange( shuffleId: Int, startMapIndex: Int, endMapIndex: Int, startPartition: Int, - endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { -logDebug(s"Fetching outputs for shuffle $shuffleId, mappers $startMapIndex-$endMapIndex" + - s"partitions $startPartition-$endPartition") + endPartition: Int) +: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { +logDebug(s"Fetching outputs for shuffle $shuffleId") val statuses = getStatuses(shuffleId, conf) try { + val endMapIndex0 = if (endMapIndex == Int.MaxValue) statuses.length else endMapIndex + logDebug(s"Convert map statuses for sh
[GitHub] [spark] xianyinxin commented on a change in pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
xianyinxin commented on a change in pull request #28875: URL: https://github.com/apache/spark/pull/28875#discussion_r444000449 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -468,13 +458,25 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging throw new ParseException("There must be at least one WHEN clause in a MERGE statement", ctx) } // children being empty means that the condition is not set -if (matchedActions.length == 2 && matchedActions.head.children.isEmpty) { - throw new ParseException("When there are 2 MATCHED clauses in a MERGE statement, " + -"the first MATCHED clause must have a condition", ctx) -} -if (matchedActions.groupBy(_.getClass).mapValues(_.size).exists(_._2 > 1)) { +val matchedActionSize = matchedActions.length +if (matchedActionSize >= 2 && !matchedActions.init.forall(_.condition.nonEmpty)) { + throw new ParseException( +s"When there are $matchedActionSize MATCHED clauses in a MERGE statement, " + Review comment: done. ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala ## @@ -1134,58 +1134,70 @@ class DDLParserSuite extends AnalysisTest { } } - test("merge into table: at most two matched clauses") { -val exc = intercept[ParseException] { - parsePlan( -""" - |MERGE INTO testcat1.ns1.ns2.tbl AS target - |USING testcat2.ns1.ns2.tbl AS source - |ON target.col1 = source.col1 - |WHEN MATCHED AND (target.col2='delete') THEN DELETE - |WHEN MATCHED AND (target.col2='update1') THEN UPDATE SET target.col2 = source.col2 - |WHEN MATCHED AND (target.col2='update2') THEN UPDATE SET target.col2 = source.col2 - |WHEN NOT MATCHED AND (target.col2='insert') - |THEN INSERT (target.col1, target.col2) values (source.col1, source.col2) -""".stripMargin) -} - -assert(exc.getMessage.contains("There should be at most 2 'WHEN MATCHED' clauses.")) + test("merge into table: multi matched and not matched clauses") { +parseCompare( + """ +|MERGE INTO testcat1.ns1.ns2.tbl AS target +|USING testcat2.ns1.ns2.tbl AS source +|ON target.col1 = source.col1 +|WHEN MATCHED AND (target.col2='delete') THEN DELETE +|WHEN MATCHED AND (target.col2='update to 1') THEN UPDATE SET target.col2 = 1 +|WHEN MATCHED AND (target.col2='update to 2') THEN UPDATE SET target.col2 = 2 +|WHEN NOT MATCHED AND (target.col2='insert 1') +|THEN INSERT (target.col1, target.col2) values (source.col1, 1) +|WHEN NOT MATCHED AND (target.col2='insert 2') +|THEN INSERT (target.col1, target.col2) values (source.col1, 2) + """.stripMargin, + MergeIntoTable( +SubqueryAlias("target", UnresolvedRelation(Seq("testcat1", "ns1", "ns2", "tbl"))), +SubqueryAlias("source", UnresolvedRelation(Seq("testcat2", "ns1", "ns2", "tbl"))), +EqualTo(UnresolvedAttribute("target.col1"), UnresolvedAttribute("source.col1")), +Seq(DeleteAction(Some(EqualTo(UnresolvedAttribute("target.col2"), Literal("delete", + UpdateAction(Some(EqualTo(UnresolvedAttribute("target.col2"), Literal("update to 1"))), +Seq(Assignment(UnresolvedAttribute("target.col2"), Literal(1, + UpdateAction(Some(EqualTo(UnresolvedAttribute("target.col2"), Literal("update to 2"))), +Seq(Assignment(UnresolvedAttribute("target.col2"), Literal(2), +Seq(InsertAction(Some(EqualTo(UnresolvedAttribute("target.col2"), Literal("insert 1"))), + Seq(Assignment(UnresolvedAttribute("target.col1"), UnresolvedAttribute("source.col1")), +Assignment(UnresolvedAttribute("target.col2"), Literal(1, + InsertAction(Some(EqualTo(UnresolvedAttribute("target.col2"), Literal("insert 2"))), +Seq(Assignment(UnresolvedAttribute("target.col1"), UnresolvedAttribute("source.col1")), + Assignment(UnresolvedAttribute("target.col2"), Literal(2))) } - test("merge into table: at most one not matched clause") { + test("merge into table: the first matched clause must have a condition if there's a second") { Review comment: done ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -468,13 +458,25 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging throw new ParseException("There must be at least one WHEN clause in a MERGE statement", ctx) } // children being empty means that the condition is not set -if (matchedActions.length == 2 && matchedActions.head.children.isEmpty) { - throw new ParseException("When there are 2 MATCHED clauses in a MERGE statement, " + -"the first MATCHE
[GitHub] [spark] cloud-fan commented on a change in pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager
cloud-fan commented on a change in pull request #28895: URL: https://github.com/apache/spark/pull/28895#discussion_r444000561 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -737,35 +721,21 @@ private[spark] class MapOutputTrackerMaster( // Get blocks sizes by executor Id. Note that zero-sized blocks are excluded in the result. // This method is only called in local-mode. def getMapSizesByExecutorId( - shuffleId: Int, - startPartition: Int, - endPartition: Int) - : Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { -logDebug(s"Fetching outputs for shuffle $shuffleId, partitions $startPartition-$endPartition") -shuffleStatuses.get(shuffleId) match { - case Some (shuffleStatus) => -shuffleStatus.withMapStatuses { statuses => - MapOutputTracker.convertMapStatuses( -shuffleId, startPartition, endPartition, statuses, 0, shuffleStatus.mapStatuses.length) -} - case None => -Iterator.empty -} - } - - override def getMapSizesByRange( shuffleId: Int, startMapIndex: Int, endMapIndex: Int, startPartition: Int, - endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { -logDebug(s"Fetching outputs for shuffle $shuffleId, mappers $startMapIndex-$endMapIndex" + - s"partitions $startPartition-$endPartition") + endPartition: Int) +: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { +logDebug(s"Fetching outputs for shuffle $shuffleId") shuffleStatuses.get(shuffleId) match { - case Some(shuffleStatus) => + case Some (shuffleStatus) => shuffleStatus.withMapStatuses { statuses => + val endMapIndex0 = if (endMapIndex == Int.MaxValue) statuses.length else endMapIndex + logDebug(s"Convert map statuses for shuffle $shuffleId, " + +s"partitions $startPartition-$endPartition, mappers $startMapIndex-$endMapIndex0") Review comment: let's follow the original log and put `mappers` before `partitions`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager
cloud-fan commented on a change in pull request #28895: URL: https://github.com/apache/spark/pull/28895#discussion_r444000320 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -737,35 +721,21 @@ private[spark] class MapOutputTrackerMaster( // Get blocks sizes by executor Id. Note that zero-sized blocks are excluded in the result. // This method is only called in local-mode. def getMapSizesByExecutorId( - shuffleId: Int, - startPartition: Int, - endPartition: Int) - : Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { -logDebug(s"Fetching outputs for shuffle $shuffleId, partitions $startPartition-$endPartition") -shuffleStatuses.get(shuffleId) match { - case Some (shuffleStatus) => -shuffleStatus.withMapStatuses { statuses => - MapOutputTracker.convertMapStatuses( -shuffleId, startPartition, endPartition, statuses, 0, shuffleStatus.mapStatuses.length) -} - case None => -Iterator.empty -} - } - - override def getMapSizesByRange( shuffleId: Int, startMapIndex: Int, endMapIndex: Int, startPartition: Int, - endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { -logDebug(s"Fetching outputs for shuffle $shuffleId, mappers $startMapIndex-$endMapIndex" + - s"partitions $startPartition-$endPartition") + endPartition: Int) +: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { +logDebug(s"Fetching outputs for shuffle $shuffleId") shuffleStatuses.get(shuffleId) match { - case Some(shuffleStatus) => + case Some (shuffleStatus) => shuffleStatus.withMapStatuses { statuses => + val endMapIndex0 = if (endMapIndex == Int.MaxValue) statuses.length else endMapIndex Review comment: `actualEndMapIndex`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default
gatorsmile edited a comment on pull request #28897: URL: https://github.com/apache/spark/pull/28897#issuecomment-647946667 Yes. As you said, the default version is very important for PySpark users. I am afraid there are breaking changes in Hadoop 3.x releases. We should avoid making this change until we can resolve https://issues.apache.org/jira/browse/SPARK-32017 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager
cloud-fan commented on a change in pull request #28895: URL: https://github.com/apache/spark/pull/28895#discussion_r443999839 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -737,35 +721,21 @@ private[spark] class MapOutputTrackerMaster( // Get blocks sizes by executor Id. Note that zero-sized blocks are excluded in the result. // This method is only called in local-mode. def getMapSizesByExecutorId( - shuffleId: Int, - startPartition: Int, - endPartition: Int) - : Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { -logDebug(s"Fetching outputs for shuffle $shuffleId, partitions $startPartition-$endPartition") -shuffleStatuses.get(shuffleId) match { - case Some (shuffleStatus) => -shuffleStatus.withMapStatuses { statuses => - MapOutputTracker.convertMapStatuses( -shuffleId, startPartition, endPartition, statuses, 0, shuffleStatus.mapStatuses.length) -} - case None => -Iterator.empty -} - } - - override def getMapSizesByRange( shuffleId: Int, startMapIndex: Int, endMapIndex: Int, startPartition: Int, - endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { -logDebug(s"Fetching outputs for shuffle $shuffleId, mappers $startMapIndex-$endMapIndex" + - s"partitions $startPartition-$endPartition") + endPartition: Int) +: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { Review comment: unnecessary change This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager
cloud-fan commented on a change in pull request #28895: URL: https://github.com/apache/spark/pull/28895#discussion_r44333 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -737,35 +721,21 @@ private[spark] class MapOutputTrackerMaster( // Get blocks sizes by executor Id. Note that zero-sized blocks are excluded in the result. // This method is only called in local-mode. def getMapSizesByExecutorId( - shuffleId: Int, - startPartition: Int, - endPartition: Int) - : Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { -logDebug(s"Fetching outputs for shuffle $shuffleId, partitions $startPartition-$endPartition") -shuffleStatuses.get(shuffleId) match { - case Some (shuffleStatus) => -shuffleStatus.withMapStatuses { statuses => - MapOutputTracker.convertMapStatuses( -shuffleId, startPartition, endPartition, statuses, 0, shuffleStatus.mapStatuses.length) -} - case None => -Iterator.empty -} - } - - override def getMapSizesByRange( shuffleId: Int, startMapIndex: Int, endMapIndex: Int, startPartition: Int, - endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { -logDebug(s"Fetching outputs for shuffle $shuffleId, mappers $startMapIndex-$endMapIndex" + - s"partitions $startPartition-$endPartition") + endPartition: Int) +: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = { +logDebug(s"Fetching outputs for shuffle $shuffleId") shuffleStatuses.get(shuffleId) match { - case Some(shuffleStatus) => + case Some (shuffleStatus) => Review comment: unnecessary change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table
AmplabJenkins removed a comment on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-647946391 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124394/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default
gatorsmile commented on pull request #28897: URL: https://github.com/apache/spark/pull/28897#issuecomment-647946667 Yes. As you said, the default version is very important for PySpark users. We should avoid making this change until we can resolve https://issues.apache.org/jira/browse/SPARK-32017 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table
SparkQA commented on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-647946372 **[Test build #124394 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124394/testReport)** for PR 28901 at commit [`fa1a84a`](https://github.com/apache/spark/commit/fa1a84a8fd5964d91a8b43bcb75609af43553f61). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager
cloud-fan commented on a change in pull request #28895: URL: https://github.com/apache/spark/pull/28895#discussion_r443999460 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -335,28 +335,12 @@ private[spark] abstract class MapOutputTracker(conf: SparkConf) extends Logging * tuples describing the shuffle blocks that are stored at that block manager. */ def getMapSizesByExecutorId( - shuffleId: Int, - startPartition: Int, - endPartition: Int) - : Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] - - /** - * Called from executors to get the server URIs and output sizes for each shuffle block that - * needs to be read from a given range of map output partitions (startPartition is included but - * endPartition is excluded from the range) and is produced by - * a range of mappers (startMapIndex, endMapIndex, startMapIndex is included and - * the endMapIndex is excluded). - * - * @return A sequence of 2-item tuples, where the first item in the tuple is a BlockManagerId, - * and the second item is a sequence of (shuffle block id, shuffle block size, map index) - * tuples describing the shuffle blocks that are stored at that block manager. - */ - def getMapSizesByRange( shuffleId: Int, startMapIndex: Int, endMapIndex: Int, startPartition: Int, - endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] + endPartition: Int) + : Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] Review comment: unnecessary change This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table
SparkQA removed a comment on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-647945315 **[Test build #124394 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124394/testReport)** for PR 28901 at commit [`fa1a84a`](https://github.com/apache/spark/commit/fa1a84a8fd5964d91a8b43bcb75609af43553f61). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table
AmplabJenkins removed a comment on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-647946382 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table
AmplabJenkins commented on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-647946382 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource
AmplabJenkins removed a comment on pull request #27366: URL: https://github.com/apache/spark/pull/27366#issuecomment-647945936 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan edited a comment on pull request #28780: [SPARK-31952][SQL]Fix incorrect memory spill metric when doing Aggregate
cloud-fan edited a comment on pull request #28780: URL: https://github.com/apache/spark/pull/28780#issuecomment-647945904 shall we set `sorter.totalSpillBytes`? then we can update the metrics correctly in `sorter.spill`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table
AmplabJenkins removed a comment on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-647945906 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28780: [SPARK-31952][SQL]Fix incorrect memory spill metric when doing Aggregate
cloud-fan commented on pull request #28780: URL: https://github.com/apache/spark/pull/28780#issuecomment-647945904 shall we set `sorter.totalSpillBytes`, then we can update the metrics correctly in `sort.spill`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource
AmplabJenkins commented on pull request #27366: URL: https://github.com/apache/spark/pull/27366#issuecomment-647945936 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan edited a comment on pull request #28780: [SPARK-31952][SQL]Fix incorrect memory spill metric when doing Aggregate
cloud-fan edited a comment on pull request #28780: URL: https://github.com/apache/spark/pull/28780#issuecomment-647945904 shall we set `sorter.totalSpillBytes`? then we can update the metrics correctly in `sort.spill`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table
AmplabJenkins commented on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-647945906 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource
SparkQA commented on pull request #27366: URL: https://github.com/apache/spark/pull/27366#issuecomment-647945350 **[Test build #124395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124395/testReport)** for PR 27366 at commit [`38eb601`](https://github.com/apache/spark/commit/38eb601c6b6ea82015d80e0e8fd1e7030a8406dd). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table
SparkQA commented on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-647945315 **[Test build #124394 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124394/testReport)** for PR 28901 at commit [`fa1a84a`](https://github.com/apache/spark/commit/fa1a84a8fd5964d91a8b43bcb75609af43553f61). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28886: [SPARK-32043][SQL] Replace Decimal by Int op in `make_interval` and `make_timestamp`
cloud-fan commented on a change in pull request #28886: URL: https://github.com/apache/spark/pull/28886#discussion_r443997431 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala ## @@ -751,7 +751,8 @@ object IntervalUtils { secs: Decimal): CalendarInterval = { val totalMonths = Math.addExact(months, Math.multiplyExact(years, MONTHS_PER_YEAR)) val totalDays = Math.addExact(days, Math.multiplyExact(weeks, DAYS_PER_WEEK)) -var micros = (secs * Decimal(MICROS_PER_SECOND)).toLong +assert(secs.scale == 6, "Seconds fractional must have 6 digits for microseconds") Review comment: shall we check the precision as well? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #28866: [SPARK-31845][CORE][TESTS] DAGSchedulerSuite: Reuse completeNextStageWithFetchFailure
Ngone51 commented on pull request #28866: URL: https://github.com/apache/spark/pull/28866#issuecomment-647943905 LGTM, also cc @jiangxb1987 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default
HyukjinKwon commented on pull request #28897: URL: https://github.com/apache/spark/pull/28897#issuecomment-647942729 ^ I target to have a way to control it in Spark 3.1 FWIW at SPARK-32017 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession
AmplabJenkins removed a comment on pull request #28899: URL: https://github.com/apache/spark/pull/28899#issuecomment-647942233 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124389/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession
AmplabJenkins removed a comment on pull request #28899: URL: https://github.com/apache/spark/pull/28899#issuecomment-647942226 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource
MaxGekk commented on a change in pull request #27366: URL: https://github.com/apache/spark/pull/27366#discussion_r443994932 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmark.scala ## @@ -508,6 +548,7 @@ object JsonBenchmark extends SqlBasedBenchmark { jsonInDS(50 * 1000 * 1000, numIters) jsonInFile(50 * 1000 * 1000, numIters) datetimeBenchmark(rowsNum = 10 * 1000 * 1000, numIters) + filtersPushdownBenchmark(rowsNum = 100 * 1000, numIters) Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession
SparkQA removed a comment on pull request #28899: URL: https://github.com/apache/spark/pull/28899#issuecomment-647909279 **[Test build #124389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124389/testReport)** for PR 28899 at commit [`a486f1f`](https://github.com/apache/spark/commit/a486f1fae0c733571275cad6dab981803d47cbe7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession
AmplabJenkins commented on pull request #28899: URL: https://github.com/apache/spark/pull/28899#issuecomment-647942226 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession
SparkQA commented on pull request #28899: URL: https://github.com/apache/spark/pull/28899#issuecomment-647942120 **[Test build #124389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124389/testReport)** for PR 28899 at commit [`a486f1f`](https://github.com/apache/spark/commit/a486f1fae0c733571275cad6dab981803d47cbe7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28894: [SPARK-32052][SQL] Extract common code from date-time field expressions
HyukjinKwon commented on pull request #28894: URL: https://github.com/apache/spark/pull/28894#issuecomment-647941360 late LGTM too This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] venkata91 commented on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark's bl
venkata91 commented on pull request #28287: URL: https://github.com/apache/spark/pull/28287#issuecomment-647940759 @tgravescs After thinking about the problem and also after discussing with @mridulm, I have handled this problem now by just keeping track of unschedulable task sets in order to add more executors when dynamic allocation is enabled. Now once some task becomes schedulable, we'll clear this set since some executor got free or we have just acquired a new executor and found a way to make progress. Let me know what do you think about this change. Thanks for taking a look previously and giving the overall context This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager
AmplabJenkins removed a comment on pull request #28895: URL: https://github.com/apache/spark/pull/28895#issuecomment-647940129 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager
AmplabJenkins commented on pull request #28895: URL: https://github.com/apache/spark/pull/28895#issuecomment-647940129 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager
SparkQA commented on pull request #28895: URL: https://github.com/apache/spark/pull/28895#issuecomment-647939695 **[Test build #124393 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124393/testReport)** for PR 28895 at commit [`2a31450`](https://github.com/apache/spark/commit/2a31450f341ac5d4fc46817dd65248b5d973c002). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource
MaxGekk commented on a change in pull request #27366: URL: https://github.com/apache/spark/pull/27366#discussion_r443991161 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmark.scala ## @@ -495,6 +496,45 @@ object JsonBenchmark extends SqlBasedBenchmark { } } + private def filtersPushdownBenchmark(rowsNum: Int, numIters: Int): Unit = { +val benchmark = new Benchmark(s"Filters pushdown", rowsNum, output = output) +val colsNum = 100 +val fields = Seq.tabulate(colsNum)(i => StructField(s"col$i", TimestampType)) +val schema = StructType(StructField("key", IntegerType) +: fields) +def columns(): Seq[Column] = { + val ts = Seq.tabulate(colsNum) { i => +lit(Instant.ofEpochSecond(i * 12345678)).as(s"col$i") + } + ($"id" % 1000).as("key") +: ts +} +withTempPath { path => + spark.range(rowsNum).select(columns(): _*).write.json(path.getAbsolutePath) + def readback = { +spark.read.schema(schema).json(path.getAbsolutePath) + } + + benchmark.addCase(s"w/o filters", numIters) { _ => +readback.noop() + } + + def withFilter(configEnabled: Boolean): Unit = { +withSQLConf(SQLConf.JSON_FILTER_PUSHDOWN_ENABLED.key -> configEnabled.toString()) { + readback.filter($"key" === 0).noop() +} + } + + benchmark.addCase(s"pushdown disabled", numIters) { _ => Review comment: I will remove it here and in other places. Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28903: [SPARK-19939] [ML] Add support for association rules in ML
AmplabJenkins removed a comment on pull request #28903: URL: https://github.com/apache/spark/pull/28903#issuecomment-647937510 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource
MaxGekk commented on a change in pull request #27366: URL: https://github.com/apache/spark/pull/27366#discussion_r443990454 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/StructFilters.scala ## @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst + +import scala.util.Try + +import org.apache.spark.sql.catalyst.StructFilters._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.sources +import org.apache.spark.sql.types.{BooleanType, StructType} + +/** + * The class provides API for applying pushed down filters to partially or + * fully set internal rows that have the struct schema. + * + * @param pushedFilters The pushed down source filters. The filters should refer to + * the fields of the provided schema. + * @param schema The required schema of records from datasource files. + */ +abstract class StructFilters(pushedFilters: Seq[sources.Filter], schema: StructType) { + + protected val filters = pushedFilters.filter(checkFilterRefs(_, schema.fieldNames.toSet)) + + /** + * Applies pushed down source filters to the given row assuming that + * value at `index` has been already set. + * + * @param row The row with fully or partially set values. + * @param index The index of already set value. + * @return true if currently processed row can be skipped otherwise false. + */ + def skipRow(row: InternalRow, index: Int): Boolean + + /** + * Resets states of pushed down filters. The method must be called before + * precessing any new row otherwise skipRow() may return wrong result. + */ + def reset(): Unit + + /** + * Compiles source filters to a predicate. + */ + def toPredicate(filters: Seq[sources.Filter]): BasePredicate = { +val reducedExpr = filters + .sortBy(_.references.length) + .flatMap(filterToExpression(_, toRef)) + .reduce(And) +Predicate.create(reducedExpr) + } + + // Finds a filter attribute in the schema and converts it to a `BoundReference` + def toRef(attr: String): Option[BoundReference] = { +schema.getFieldIndex(attr).map { index => + val field = schema(index) + BoundReference(schema.fieldIndex(attr), field.dataType, field.nullable) Review comment: Right, I will replace it by `index`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28903: [SPARK-19939] [ML] Add support for association rules in ML
AmplabJenkins commented on pull request #28903: URL: https://github.com/apache/spark/pull/28903#issuecomment-647937510 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
AmplabJenkins removed a comment on pull request #27246: URL: https://github.com/apache/spark/pull/27246#issuecomment-647937018 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124383/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
SparkQA removed a comment on pull request #27246: URL: https://github.com/apache/spark/pull/27246#issuecomment-647896228 **[Test build #124383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124383/testReport)** for PR 27246 at commit [`4c21ba6`](https://github.com/apache/spark/commit/4c21ba660fe4e992cbfaf33932abc3ce3587ebc4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28903: [SPARK-19939] [ML] Add support for association rules in ML
SparkQA commented on pull request #28903: URL: https://github.com/apache/spark/pull/28903#issuecomment-647937053 **[Test build #124392 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124392/testReport)** for PR 28903 at commit [`1cc6560`](https://github.com/apache/spark/commit/1cc6560e7f72594e2d1bf6400c5391741d64dae0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
AmplabJenkins commented on pull request #27246: URL: https://github.com/apache/spark/pull/27246#issuecomment-647937012 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
AmplabJenkins removed a comment on pull request #27246: URL: https://github.com/apache/spark/pull/27246#issuecomment-647937012 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
SparkQA commented on pull request #27246: URL: https://github.com/apache/spark/pull/27246#issuecomment-647936567 **[Test build #124383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124383/testReport)** for PR 27246 at commit [`4c21ba6`](https://github.com/apache/spark/commit/4c21ba660fe4e992cbfaf33932abc3ce3587ebc4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao opened a new pull request #28903: [SPARK-19939] [ML] Add support for association rules in ML
huaxingao opened a new pull request #28903: URL: https://github.com/apache/spark/pull/28903 ### What changes were proposed in this pull request? Adding support to Association Rules in Spark ml.fpm. ### Why are the changes needed? Support is an indication of how frequently the itemset of an association rule appears in the database and suggests if the rules are generally applicable to the dateset. Refer to [wiki](https://en.wikipedia.org/wiki/Association_rule_learning#Support) for more details. ### Does this PR introduce _any_ user-facing change? Yes. Associate Rules now have support measure ### How was this patch tested? existing and new unit test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28900: [SPARK-32056][SQL] Coalesce partitions for repartition by expressions when AQE is enabled
AmplabJenkins removed a comment on pull request #28900: URL: https://github.com/apache/spark/pull/28900#issuecomment-647935005 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28900: [SPARK-32056][SQL] Coalesce partitions for repartition by expressions when AQE is enabled
AmplabJenkins commented on pull request #28900: URL: https://github.com/apache/spark/pull/28900#issuecomment-647935005 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28900: [SPARK-32056][SQL] Coalesce partitions for repartition by expressions when AQE is enabled
SparkQA commented on pull request #28900: URL: https://github.com/apache/spark/pull/28900#issuecomment-647934563 **[Test build #124391 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124391/testReport)** for PR 28900 at commit [`8e39ed7`](https://github.com/apache/spark/commit/8e39ed7787c9f80591963de1e7ab4f0f2c24fda3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession
AmplabJenkins commented on pull request #28899: URL: https://github.com/apache/spark/pull/28899#issuecomment-647932324 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession
AmplabJenkins removed a comment on pull request #28899: URL: https://github.com/apache/spark/pull/28899#issuecomment-647932324 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28900: [SPARK-32056][SQL] Coalesce partitions for repartition by expressions when AQE is enabled
viirya commented on a change in pull request #28900: URL: https://github.com/apache/spark/pull/28900#discussion_r443984508 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -1026,15 +1026,48 @@ class AdaptiveQueryExecSuite Seq(true, false).foreach { enableAQE => withSQLConf( SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> enableAQE.toString, +SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true", SQLConf.SHUFFLE_PARTITIONS.key -> "6", SQLConf.COALESCE_PARTITIONS_INITIAL_PARTITION_NUM.key -> "7") { -val partitionsNum = spark.range(10).repartition($"id").rdd.collectPartitions().length +val df = spark.range(10).repartition($"id") Review comment: ok. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #28894: [SPARK-32052][SQL] Extract common code from date-time field expressions
cloud-fan closed pull request #28894: URL: https://github.com/apache/spark/pull/28894 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static
beliefer commented on pull request #26875: URL: https://github.com/apache/spark/pull/26875#issuecomment-647931974 test1 looks the same as test3. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession
SparkQA commented on pull request #28899: URL: https://github.com/apache/spark/pull/28899#issuecomment-647931949 **[Test build #124390 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124390/testReport)** for PR 28899 at commit [`708c9ff`](https://github.com/apache/spark/commit/708c9ff27ac0f234fbac0d4d6524adb90c9cf0b3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28884: [SPARK-20249][ML][PYSPARK] Add training summary for LinearSVCModel
AmplabJenkins removed a comment on pull request #28884: URL: https://github.com/apache/spark/pull/28884#issuecomment-647927719 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28884: [SPARK-20249][ML][PYSPARK] Add training summary for LinearSVCModel
SparkQA removed a comment on pull request #28884: URL: https://github.com/apache/spark/pull/28884#issuecomment-647905327 **[Test build #124386 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124386/testReport)** for PR 28884 at commit [`423eeb5`](https://github.com/apache/spark/commit/423eeb502a1ccdba1fc774c7ba3d2057bf38207b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28894: [SPARK-32052][SQL] Extract common code from date-time field expressions
cloud-fan commented on pull request #28894: URL: https://github.com/apache/spark/pull/28894#issuecomment-647931165 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession
ulysses-you commented on a change in pull request #28899: URL: https://github.com/apache/spark/pull/28899#discussion_r443982907 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala ## @@ -240,4 +240,20 @@ class SparkSessionBuilderSuite extends SparkFunSuite with BeforeAndAfterEach { assert(session.conf.get(GLOBAL_TEMP_DATABASE) === "globaltempdb-spark-31532-2") assert(session.conf.get(WAREHOUSE_PATH) === "SPARK-31532-db-2") } + + test("SPARK-32062: reset listenerRegistered in SparkSession") { +(1 to 2).foreach { i => + val conf = new SparkConf() +.setMaster("local") +.setAppName(s"test-SPARK-32062-$i") + val context = new SparkContext(conf) Review comment: missed it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST
cloud-fan commented on a change in pull request #28856: URL: https://github.com/apache/spark/pull/28856#discussion_r443981984 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2589,6 +2589,8 @@ object Sequence { } } + // To generate time sequences, we use scale 1 in TemporalSequenceImpl + // for `TimestampType`, while MICROS_PER_DAY for `DateType` Review comment: if start/end is date, can the step by seconds/minutes/hours? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession
cloud-fan commented on a change in pull request #28899: URL: https://github.com/apache/spark/pull/28899#discussion_r443980530 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala ## @@ -240,4 +240,20 @@ class SparkSessionBuilderSuite extends SparkFunSuite with BeforeAndAfterEach { assert(session.conf.get(GLOBAL_TEMP_DATABASE) === "globaltempdb-spark-31532-2") assert(session.conf.get(WAREHOUSE_PATH) === "SPARK-31532-db-2") } + + test("SPARK-32062: reset listenerRegistered in SparkSession") { +(1 to 2).foreach { i => + val conf = new SparkConf() +.setMaster("local") +.setAppName(s"test-SPARK-32062-$i") + val context = new SparkContext(conf) Review comment: does this work? The test doesn't stop the spark context, and AFAIK we don't support having multiple spark context instance at the same time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28884: [SPARK-20249][ML][PYSPARK] Add training summary for LinearSVCModel
AmplabJenkins commented on pull request #28884: URL: https://github.com/apache/spark/pull/28884#issuecomment-647927719 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28884: [SPARK-20249][ML][PYSPARK] Add training summary for LinearSVCModel
SparkQA commented on pull request #28884: URL: https://github.com/apache/spark/pull/28884#issuecomment-647927281 **[Test build #124386 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124386/testReport)** for PR 28884 at commit [`423eeb5`](https://github.com/apache/spark/commit/423eeb502a1ccdba1fc774c7ba3d2057bf38207b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28900: [SPARK-32056][SQL] Coalesce partitions for repartition by expressions when AQE is enabled
cloud-fan commented on a change in pull request #28900: URL: https://github.com/apache/spark/pull/28900#discussion_r443979185 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -1026,15 +1026,48 @@ class AdaptiveQueryExecSuite Seq(true, false).foreach { enableAQE => withSQLConf( SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> enableAQE.toString, +SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true", SQLConf.SHUFFLE_PARTITIONS.key -> "6", SQLConf.COALESCE_PARTITIONS_INITIAL_PARTITION_NUM.key -> "7") { -val partitionsNum = spark.range(10).repartition($"id").rdd.collectPartitions().length +val df = spark.range(10).repartition($"id") Review comment: can we test `repartition(numPartitions)` in this test case and make sure the partition number doesn't change? You new test case already test repartition by key/range. ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -1026,15 +1026,48 @@ class AdaptiveQueryExecSuite Seq(true, false).foreach { enableAQE => withSQLConf( SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> enableAQE.toString, +SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true", SQLConf.SHUFFLE_PARTITIONS.key -> "6", SQLConf.COALESCE_PARTITIONS_INITIAL_PARTITION_NUM.key -> "7") { -val partitionsNum = spark.range(10).repartition($"id").rdd.collectPartitions().length +val df = spark.range(10).repartition($"id") Review comment: can we test `repartition(numPartitions)` in this test case and make sure the partition number doesn't change? Your new test case already test repartition by key/range. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #28892: [MINOR][SQL] Simplify DateTimeUtils.cleanLegacyTimestampStr
cloud-fan closed pull request #28892: URL: https://github.com/apache/spark/pull/28892 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28892: [MINOR][SQL] Simplify DateTimeUtils.cleanLegacyTimestampStr
cloud-fan commented on pull request #28892: URL: https://github.com/apache/spark/pull/28892#issuecomment-647924824 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager
AmplabJenkins removed a comment on pull request #28895: URL: https://github.com/apache/spark/pull/28895#issuecomment-647924065 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager
AmplabJenkins commented on pull request #28895: URL: https://github.com/apache/spark/pull/28895#issuecomment-647924065 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager
SparkQA removed a comment on pull request #28895: URL: https://github.com/apache/spark/pull/28895#issuecomment-647880850 **[Test build #124380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124380/testReport)** for PR 28895 at commit [`05fe4c7`](https://github.com/apache/spark/commit/05fe4c7be3a7b55d7a04e461e9c79c92031e627c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager
SparkQA commented on pull request #28895: URL: https://github.com/apache/spark/pull/28895#issuecomment-647923313 **[Test build #124380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124380/testReport)** for PR 28895 at commit [`05fe4c7`](https://github.com/apache/spark/commit/05fe4c7be3a7b55d7a04e461e9c79c92031e627c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] guoqiaoli1992 commented on pull request #26873: [SPARK-30240][core] Support HTTP redirects directly to a proxy server.
guoqiaoli1992 commented on pull request #26873: URL: https://github.com/apache/spark/pull/26873#issuecomment-647921920 I'm facing a similar issue(I think), can I solve it with spark.ui.proxyRedirectUri? https://stackoverflow.com/questions/62527979/running-history-server-behind-reverse-proxy This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28892: [MINOR][SQL] Simplify DateTimeUtils.cleanLegacyTimestampStr
MaxGekk commented on a change in pull request #28892: URL: https://github.com/apache/spark/pull/28892#discussion_r443973612 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ## @@ -203,20 +203,10 @@ object DateTimeUtils { Math.multiplyExact(millis, MICROS_PER_MILLIS) } + private final val gmtUtf8 = UTF8String.fromString("GMT") // The method is called by JSON/CSV parser to clean up the legacy timestamp string by removing - // the "GMT" string. - def cleanLegacyTimestampStr(s: String): String = { -val indexOfGMT = s.indexOf("GMT") -if (indexOfGMT != -1) { - // ISO8601 with a weird time zone specifier (2000-01-01T00:00GMT+01:00) - val s0 = s.substring(0, indexOfGMT) - val s1 = s.substring(indexOfGMT + 3) - // Mapped to 2000-01-01T00:00+01:00 - s0 + s1 -} else { - s -} - } + // the "GMT" string. For example, it returns 2000-01-01T00:00+01:00 for 2000-01-01T00:00GMT+01:00. + def cleanLegacyTimestampStr(s: UTF8String): UTF8String = s.replace(gmtUtf8, UTF8String.EMPTY_UTF8) Review comment: It has but look at how it is implemented via regexp. @JoshRosen implemented more effective replace in UTF8String https://github.com/apache/spark/pull/24707. That's why I took it. I hope it seems reasonable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kdzhao commented on a change in pull request #28731: [SPARK-31909][SQL] Add SPARK_SUBMIT_OPTS to Beeline Script
kdzhao commented on a change in pull request #28731: URL: https://github.com/apache/spark/pull/28731#discussion_r443874572 ## File path: bin/beeline ## @@ -28,5 +28,7 @@ if [ -z "${SPARK_HOME}" ]; then source "$(dirname "$0")"/find-spark-home fi +. "${SPARK_HOME}"/bin/load-spark-env.sh + CLASS="org.apache.hive.beeline.BeeLine" -exec "${SPARK_HOME}/bin/spark-class" $CLASS "$@" +exec "${SPARK_HOME}/bin/spark-class" $SPARK_SUBMIT_OPTS $CLASS "$@" Review comment: I think I misspoke for it. What I want to say is, in hive, looks like its beeline command is just call to hive with different parameters: https://github.com/apache/hive/blob/branch-1.2/bin/beeline https://github.com/apache/hive/blob/branch-1.2/bin/hive Agree that hive's beeline doesn't read spark parameter, and I would assume it reads its own (I saw "HADOOP_CLIENT_OPTS" etc in above script). Now back to spark, agree with you that fixing it in spark-class might cover more cases. On another side, so far only the beeline has this issue, so an easy fix on the beeline script also makes sense as a stopgap. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] attilapiros commented on a change in pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is
attilapiros commented on a change in pull request #28848: URL: https://github.com/apache/spark/pull/28848#discussion_r443970197 ## File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ## @@ -1939,24 +1941,24 @@ private[spark] class DAGScheduler( hostToUnregisterOutputs: Option[String], maybeEpoch: Option[Long] = None): Unit = { val currentEpoch = maybeEpoch.getOrElse(mapOutputTracker.getEpoch) +logDebug(s"Considering removal of executor $execId; " + + s"fileLost: $fileLost, currentEpoch: $currentEpoch") if (!failedEpoch.contains(execId) || failedEpoch(execId) < currentEpoch) { failedEpoch(execId) = currentEpoch - logInfo("Executor lost: %s (epoch %d)".format(execId, currentEpoch)) + logInfo(s"Executor lost: $execId (epoch $currentEpoch)") Review comment: Guilty as charged. My thinking was the following: as the string interpolation preferred in the project and the change itself is quite tiny and has zero risk and we are here and touching these lines it would be good to do it now (aka Boy Scout Rule). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default
dongjoon-hyun edited a comment on pull request #28897: URL: https://github.com/apache/spark/pull/28897#issuecomment-647912232 BTW, please note that the default version is very important. For example, PySpark is downloaded 1,333,883 times last week, but we provides them only Spark with `Hadoop 2.7.4`. - https://pypistats.org/packages/pyspark This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default
dongjoon-hyun edited a comment on pull request #28897: URL: https://github.com/apache/spark/pull/28897#issuecomment-647912232 BTW, please note that the default version is very important. For example, PySpark is downloaded 1,333,883 times last week, but it's only Spark distribution with `Hadoop 2.7.4`. - https://pypistats.org/packages/pyspark This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default
dongjoon-hyun commented on pull request #28897: URL: https://github.com/apache/spark/pull/28897#issuecomment-647912232 BTW, please note that the default version is very important. For example, PySpark is downloaded 1,333,883 times, but we provides them only Spark with `Hadoop 2.7.4`. - https://pypistats.org/packages/pyspark This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession
AmplabJenkins removed a comment on pull request #28899: URL: https://github.com/apache/spark/pull/28899#issuecomment-647909575 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession
AmplabJenkins commented on pull request #28899: URL: https://github.com/apache/spark/pull/28899#issuecomment-647909575 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession
SparkQA commented on pull request #28899: URL: https://github.com/apache/spark/pull/28899#issuecomment-647909279 **[Test build #124389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124389/testReport)** for PR 28899 at commit [`a486f1f`](https://github.com/apache/spark/commit/a486f1fae0c733571275cad6dab981803d47cbe7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager
Ngone51 commented on a change in pull request #28895: URL: https://github.com/apache/spark/pull/28895#discussion_r443962431 ## File path: core/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala ## @@ -43,26 +44,16 @@ private[spark] trait ShuffleManager { context: TaskContext, metrics: ShuffleWriteMetricsReporter): ShuffleWriter[K, V] - /** - * Get a reader for a range of reduce partitions (startPartition to endPartition-1, inclusive). - * Called on executors by reduce tasks. - */ - def getReader[K, C]( - handle: ShuffleHandle, - startPartition: Int, - endPartition: Int, - context: TaskContext, - metrics: ShuffleReadMetricsReporter): ShuffleReader[K, C] - /** * Get a reader for a range of reduce partitions (startPartition to endPartition-1, inclusive) to - * read from map output (startMapIndex to endMapIndex - 1, inclusive). + * read from a range of map outputs which specified by mapIndexRange(startMapIndex to + * endMapIndex-1, inclusive). + * * Called on executors by reduce tasks. */ - def getReaderForRange[K, C]( + def getReader[K, C]( handle: ShuffleHandle, - startMapIndex: Int, - endMapIndex: Int, + mapIndexRange: Array[MapStatus] => (Int, Int), Review comment: Ok, I get it how to know the `endMapIndex`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is l
AmplabJenkins removed a comment on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-647907550 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
AmplabJenkins commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-647907550 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28900: [SPARK-32056][SQL] Coalesce partitions for repartition by expressions when AQE is enabled
SparkQA commented on pull request #28900: URL: https://github.com/apache/spark/pull/28900#issuecomment-647907215 **[Test build #124387 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124387/testReport)** for PR 28900 at commit [`43c4726`](https://github.com/apache/spark/commit/43c4726fa8b0de623d5563720c96632193262ec2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
SparkQA commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-647907228 **[Test build #124388 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124388/testReport)** for PR 28848 at commit [`633a0e7`](https://github.com/apache/spark/commit/633a0e7841c9a9f9175bed510120ef2fe66ebe00). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28866: [SPARK-31845][CORE][TESTS] DAGSchedulerSuite: Reuse completeNextStageWithFetchFailure
AmplabJenkins removed a comment on pull request #28866: URL: https://github.com/apache/spark/pull/28866#issuecomment-647905965 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
wypoon commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-647906200 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28892: [MINOR][SQL] Simplify DateTimeUtils.cleanLegacyTimestampStr
cloud-fan commented on a change in pull request #28892: URL: https://github.com/apache/spark/pull/28892#discussion_r443959870 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ## @@ -203,20 +203,10 @@ object DateTimeUtils { Math.multiplyExact(millis, MICROS_PER_MILLIS) } + private final val gmtUtf8 = UTF8String.fromString("GMT") // The method is called by JSON/CSV parser to clean up the legacy timestamp string by removing - // the "GMT" string. - def cleanLegacyTimestampStr(s: String): String = { -val indexOfGMT = s.indexOf("GMT") -if (indexOfGMT != -1) { - // ISO8601 with a weird time zone specifier (2000-01-01T00:00GMT+01:00) - val s0 = s.substring(0, indexOfGMT) - val s1 = s.substring(indexOfGMT + 3) - // Mapped to 2000-01-01T00:00+01:00 - s0 + s1 -} else { - s -} - } + // the "GMT" string. For example, it returns 2000-01-01T00:00+01:00 for 2000-01-01T00:00GMT+01:00. + def cleanLegacyTimestampStr(s: UTF8String): UTF8String = s.replace(gmtUtf8, UTF8String.EMPTY_UTF8) Review comment: doesn't the `java.lang.String` have the `replace` method? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28884: [SPARK-20249][ML][PYSPARK] Add training summary for LinearSVCModel
AmplabJenkins removed a comment on pull request #28884: URL: https://github.com/apache/spark/pull/28884#issuecomment-647905639 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28900: [SPARK-32056][SQL] Coalesce partitions for repartition by expressions when AQE is enabled
AmplabJenkins removed a comment on pull request #28900: URL: https://github.com/apache/spark/pull/28900#issuecomment-647905591 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/29005/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28866: [SPARK-31845][CORE][TESTS] DAGSchedulerSuite: Reuse completeNextStageWithFetchFailure
AmplabJenkins commented on pull request #28866: URL: https://github.com/apache/spark/pull/28866#issuecomment-647905965 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28884: [SPARK-20249][ML][PYSPARK] Add training summary for LinearSVCModel
AmplabJenkins commented on pull request #28884: URL: https://github.com/apache/spark/pull/28884#issuecomment-647905639 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28900: [SPARK-32056][SQL] Coalesce partitions for repartition by expressions when AQE is enabled
AmplabJenkins removed a comment on pull request #28900: URL: https://github.com/apache/spark/pull/28900#issuecomment-647905583 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org