[GitHub] [spark] xianyinxin commented on a change in pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-22 Thread GitBox


xianyinxin commented on a change in pull request #28875:
URL: https://github.com/apache/spark/pull/28875#discussion_r444002874



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -468,13 +458,25 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   throw new ParseException("There must be at least one WHEN clause in a 
MERGE statement", ctx)
 }
 // children being empty means that the condition is not set
-if (matchedActions.length == 2 && matchedActions.head.children.isEmpty) {
-  throw new ParseException("When there are 2 MATCHED clauses in a MERGE 
statement, " +
-"the first MATCHED clause must have a condition", ctx)
-}
-if (matchedActions.groupBy(_.getClass).mapValues(_.size).exists(_._2 > 1)) 
{
+val matchedActionSize = matchedActions.length
+if (matchedActionSize >= 2 && 
!matchedActions.init.forall(_.condition.nonEmpty)) {

Review comment:
   I don't think so, because the `children` of `InsertAction` and 
`UpdateAction` actually include `condition` and `assignments`. There may be 
cases where there're `assignments` and `condition` being ignored but `children` 
is nonEmpty.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28875:
URL: https://github.com/apache/spark/pull/28875#issuecomment-647948835







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

2020-06-22 Thread GitBox


dongjoon-hyun commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647949279


   @gatorsmile . Why that blocks this? Technically, this supersedes it, doesn't 
it?
   > We should avoid making this change until we can resolve 
https://issues.apache.org/jira/browse/SPARK-32017
   
   Switching the default is the real one. For example, we released Scala 2.12 
in Spark 2.4.x lines for a while, but we didn't notice the Scala function issue 
until 3.0.0 release. 
   
   Also, we can switch back to `Hadoop 2.7` before December if we want.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-22 Thread GitBox


AmplabJenkins commented on pull request #28875:
URL: https://github.com/apache/spark/pull/28875#issuecomment-647948835







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-22 Thread GitBox


SparkQA commented on pull request #28875:
URL: https://github.com/apache/spark/pull/28875#issuecomment-647948382


   **[Test build #124396 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124396/testReport)**
 for PR 28875 at commit 
[`ab97e31`](https://github.com/apache/spark/commit/ab97e31041091b4592f86349eaa81e379022b725).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager

2020-06-22 Thread GitBox


cloud-fan commented on a change in pull request #28895:
URL: https://github.com/apache/spark/pull/28895#discussion_r444000693



##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -800,35 +770,20 @@ private[spark] class MapOutputTrackerWorker(conf: 
SparkConf) extends MapOutputTr
 
   // Get blocks sizes by executor Id. Note that zero-sized blocks are excluded 
in the result.
   override def getMapSizesByExecutorId(
-  shuffleId: Int,
-  startPartition: Int,
-  endPartition: Int)
-: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = {
-logDebug(s"Fetching outputs for shuffle $shuffleId, partitions 
$startPartition-$endPartition")
-val statuses = getStatuses(shuffleId, conf)
-try {
-  MapOutputTracker.convertMapStatuses(
-shuffleId, startPartition, endPartition, statuses, 0, statuses.length)
-} catch {
-  case e: MetadataFetchFailedException =>
-// We experienced a fetch failure so our mapStatuses cache is 
outdated; clear it:
-mapStatuses.clear()
-throw e
-}
-  }
-
-  override def getMapSizesByRange(
   shuffleId: Int,
   startMapIndex: Int,
   endMapIndex: Int,
   startPartition: Int,
-  endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, 
Int)])] = {
-logDebug(s"Fetching outputs for shuffle $shuffleId, mappers 
$startMapIndex-$endMapIndex" +
-  s"partitions $startPartition-$endPartition")
+  endPartition: Int)
+: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = {

Review comment:
   unnecessary change

##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -800,35 +770,20 @@ private[spark] class MapOutputTrackerWorker(conf: 
SparkConf) extends MapOutputTr
 
   // Get blocks sizes by executor Id. Note that zero-sized blocks are excluded 
in the result.
   override def getMapSizesByExecutorId(
-  shuffleId: Int,
-  startPartition: Int,
-  endPartition: Int)
-: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = {
-logDebug(s"Fetching outputs for shuffle $shuffleId, partitions 
$startPartition-$endPartition")
-val statuses = getStatuses(shuffleId, conf)
-try {
-  MapOutputTracker.convertMapStatuses(
-shuffleId, startPartition, endPartition, statuses, 0, statuses.length)
-} catch {
-  case e: MetadataFetchFailedException =>
-// We experienced a fetch failure so our mapStatuses cache is 
outdated; clear it:
-mapStatuses.clear()
-throw e
-}
-  }
-
-  override def getMapSizesByRange(
   shuffleId: Int,
   startMapIndex: Int,
   endMapIndex: Int,
   startPartition: Int,
-  endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, 
Int)])] = {
-logDebug(s"Fetching outputs for shuffle $shuffleId, mappers 
$startMapIndex-$endMapIndex" +
-  s"partitions $startPartition-$endPartition")
+  endPartition: Int)
+: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = {
+logDebug(s"Fetching outputs for shuffle $shuffleId")
 val statuses = getStatuses(shuffleId, conf)
 try {
+  val endMapIndex0 = if (endMapIndex == Int.MaxValue) statuses.length else 
endMapIndex

Review comment:
   ditto

##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -800,35 +770,20 @@ private[spark] class MapOutputTrackerWorker(conf: 
SparkConf) extends MapOutputTr
 
   // Get blocks sizes by executor Id. Note that zero-sized blocks are excluded 
in the result.
   override def getMapSizesByExecutorId(
-  shuffleId: Int,
-  startPartition: Int,
-  endPartition: Int)
-: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = {
-logDebug(s"Fetching outputs for shuffle $shuffleId, partitions 
$startPartition-$endPartition")
-val statuses = getStatuses(shuffleId, conf)
-try {
-  MapOutputTracker.convertMapStatuses(
-shuffleId, startPartition, endPartition, statuses, 0, statuses.length)
-} catch {
-  case e: MetadataFetchFailedException =>
-// We experienced a fetch failure so our mapStatuses cache is 
outdated; clear it:
-mapStatuses.clear()
-throw e
-}
-  }
-
-  override def getMapSizesByRange(
   shuffleId: Int,
   startMapIndex: Int,
   endMapIndex: Int,
   startPartition: Int,
-  endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, 
Int)])] = {
-logDebug(s"Fetching outputs for shuffle $shuffleId, mappers 
$startMapIndex-$endMapIndex" +
-  s"partitions $startPartition-$endPartition")
+  endPartition: Int)
+: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = {
+logDebug(s"Fetching outputs for shuffle $shuffleId")
 val statuses = getStatuses(shuffleId, conf)
 try {
+  val endMapIndex0 = if (endMapIndex == Int.MaxValue) statuses.length else 
endMapIndex
+  logDebug(s"Convert map statuses for sh

[GitHub] [spark] xianyinxin commented on a change in pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-22 Thread GitBox


xianyinxin commented on a change in pull request #28875:
URL: https://github.com/apache/spark/pull/28875#discussion_r444000449



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -468,13 +458,25 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   throw new ParseException("There must be at least one WHEN clause in a 
MERGE statement", ctx)
 }
 // children being empty means that the condition is not set
-if (matchedActions.length == 2 && matchedActions.head.children.isEmpty) {
-  throw new ParseException("When there are 2 MATCHED clauses in a MERGE 
statement, " +
-"the first MATCHED clause must have a condition", ctx)
-}
-if (matchedActions.groupBy(_.getClass).mapValues(_.size).exists(_._2 > 1)) 
{
+val matchedActionSize = matchedActions.length
+if (matchedActionSize >= 2 && 
!matchedActions.init.forall(_.condition.nonEmpty)) {
+  throw new ParseException(
+s"When there are $matchedActionSize MATCHED clauses in a MERGE 
statement, " +

Review comment:
   done.

##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
##
@@ -1134,58 +1134,70 @@ class DDLParserSuite extends AnalysisTest {
 }
   }
 
-  test("merge into table: at most two matched clauses") {
-val exc = intercept[ParseException] {
-  parsePlan(
-"""
-  |MERGE INTO testcat1.ns1.ns2.tbl AS target
-  |USING testcat2.ns1.ns2.tbl AS source
-  |ON target.col1 = source.col1
-  |WHEN MATCHED AND (target.col2='delete') THEN DELETE
-  |WHEN MATCHED AND (target.col2='update1') THEN UPDATE SET 
target.col2 = source.col2
-  |WHEN MATCHED AND (target.col2='update2') THEN UPDATE SET 
target.col2 = source.col2
-  |WHEN NOT MATCHED AND (target.col2='insert')
-  |THEN INSERT (target.col1, target.col2) values (source.col1, 
source.col2)
-""".stripMargin)
-}
-
-assert(exc.getMessage.contains("There should be at most 2 'WHEN MATCHED' 
clauses."))
+  test("merge into table: multi matched and not matched clauses") {
+parseCompare(
+  """
+|MERGE INTO testcat1.ns1.ns2.tbl AS target
+|USING testcat2.ns1.ns2.tbl AS source
+|ON target.col1 = source.col1
+|WHEN MATCHED AND (target.col2='delete') THEN DELETE
+|WHEN MATCHED AND (target.col2='update to 1') THEN UPDATE SET 
target.col2 = 1
+|WHEN MATCHED AND (target.col2='update to 2') THEN UPDATE SET 
target.col2 = 2
+|WHEN NOT MATCHED AND (target.col2='insert 1')
+|THEN INSERT (target.col1, target.col2) values (source.col1, 1)
+|WHEN NOT MATCHED AND (target.col2='insert 2')
+|THEN INSERT (target.col1, target.col2) values (source.col1, 2)
+  """.stripMargin,
+  MergeIntoTable(
+SubqueryAlias("target", UnresolvedRelation(Seq("testcat1", "ns1", 
"ns2", "tbl"))),
+SubqueryAlias("source", UnresolvedRelation(Seq("testcat2", "ns1", 
"ns2", "tbl"))),
+EqualTo(UnresolvedAttribute("target.col1"), 
UnresolvedAttribute("source.col1")),
+Seq(DeleteAction(Some(EqualTo(UnresolvedAttribute("target.col2"), 
Literal("delete",
+  UpdateAction(Some(EqualTo(UnresolvedAttribute("target.col2"), 
Literal("update to 1"))),
+Seq(Assignment(UnresolvedAttribute("target.col2"), Literal(1,
+  UpdateAction(Some(EqualTo(UnresolvedAttribute("target.col2"), 
Literal("update to 2"))),
+Seq(Assignment(UnresolvedAttribute("target.col2"), Literal(2),
+Seq(InsertAction(Some(EqualTo(UnresolvedAttribute("target.col2"), 
Literal("insert 1"))),
+  Seq(Assignment(UnresolvedAttribute("target.col1"), 
UnresolvedAttribute("source.col1")),
+Assignment(UnresolvedAttribute("target.col2"), Literal(1,
+  InsertAction(Some(EqualTo(UnresolvedAttribute("target.col2"), 
Literal("insert 2"))),
+Seq(Assignment(UnresolvedAttribute("target.col1"), 
UnresolvedAttribute("source.col1")),
+  Assignment(UnresolvedAttribute("target.col2"), Literal(2)))
   }
 
-  test("merge into table: at most one not matched clause") {
+  test("merge into table: the first matched clause must have a condition if 
there's a second") {

Review comment:
   done

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -468,13 +458,25 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   throw new ParseException("There must be at least one WHEN clause in a 
MERGE statement", ctx)
 }
 // children being empty means that the condition is not set
-if (matchedActions.length == 2 && matchedActions.head.children.isEmpty) {
-  throw new ParseException("When there are 2 MATCHED clauses in a MERGE 
statement, " +
-"the first MATCHE

[GitHub] [spark] cloud-fan commented on a change in pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager

2020-06-22 Thread GitBox


cloud-fan commented on a change in pull request #28895:
URL: https://github.com/apache/spark/pull/28895#discussion_r444000561



##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -737,35 +721,21 @@ private[spark] class MapOutputTrackerMaster(
   // Get blocks sizes by executor Id. Note that zero-sized blocks are excluded 
in the result.
   // This method is only called in local-mode.
   def getMapSizesByExecutorId(
-  shuffleId: Int,
-  startPartition: Int,
-  endPartition: Int)
-  : Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = {
-logDebug(s"Fetching outputs for shuffle $shuffleId, partitions 
$startPartition-$endPartition")
-shuffleStatuses.get(shuffleId) match {
-  case Some (shuffleStatus) =>
-shuffleStatus.withMapStatuses { statuses =>
-  MapOutputTracker.convertMapStatuses(
-shuffleId, startPartition, endPartition, statuses, 0, 
shuffleStatus.mapStatuses.length)
-}
-  case None =>
-Iterator.empty
-}
-  }
-
-  override def getMapSizesByRange(
   shuffleId: Int,
   startMapIndex: Int,
   endMapIndex: Int,
   startPartition: Int,
-  endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, 
Int)])] = {
-logDebug(s"Fetching outputs for shuffle $shuffleId, mappers 
$startMapIndex-$endMapIndex" +
-  s"partitions $startPartition-$endPartition")
+  endPartition: Int)
+: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = {
+logDebug(s"Fetching outputs for shuffle $shuffleId")
 shuffleStatuses.get(shuffleId) match {
-  case Some(shuffleStatus) =>
+  case Some (shuffleStatus) =>
 shuffleStatus.withMapStatuses { statuses =>
+  val endMapIndex0 = if (endMapIndex == Int.MaxValue) statuses.length 
else endMapIndex
+  logDebug(s"Convert map statuses for shuffle $shuffleId, " +
+s"partitions $startPartition-$endPartition, mappers 
$startMapIndex-$endMapIndex0")

Review comment:
   let's follow the original log and put `mappers` before `partitions`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager

2020-06-22 Thread GitBox


cloud-fan commented on a change in pull request #28895:
URL: https://github.com/apache/spark/pull/28895#discussion_r444000320



##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -737,35 +721,21 @@ private[spark] class MapOutputTrackerMaster(
   // Get blocks sizes by executor Id. Note that zero-sized blocks are excluded 
in the result.
   // This method is only called in local-mode.
   def getMapSizesByExecutorId(
-  shuffleId: Int,
-  startPartition: Int,
-  endPartition: Int)
-  : Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = {
-logDebug(s"Fetching outputs for shuffle $shuffleId, partitions 
$startPartition-$endPartition")
-shuffleStatuses.get(shuffleId) match {
-  case Some (shuffleStatus) =>
-shuffleStatus.withMapStatuses { statuses =>
-  MapOutputTracker.convertMapStatuses(
-shuffleId, startPartition, endPartition, statuses, 0, 
shuffleStatus.mapStatuses.length)
-}
-  case None =>
-Iterator.empty
-}
-  }
-
-  override def getMapSizesByRange(
   shuffleId: Int,
   startMapIndex: Int,
   endMapIndex: Int,
   startPartition: Int,
-  endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, 
Int)])] = {
-logDebug(s"Fetching outputs for shuffle $shuffleId, mappers 
$startMapIndex-$endMapIndex" +
-  s"partitions $startPartition-$endPartition")
+  endPartition: Int)
+: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = {
+logDebug(s"Fetching outputs for shuffle $shuffleId")
 shuffleStatuses.get(shuffleId) match {
-  case Some(shuffleStatus) =>
+  case Some (shuffleStatus) =>
 shuffleStatus.withMapStatuses { statuses =>
+  val endMapIndex0 = if (endMapIndex == Int.MaxValue) statuses.length 
else endMapIndex

Review comment:
   `actualEndMapIndex`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

2020-06-22 Thread GitBox


gatorsmile edited a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647946667


   Yes. As you said, the default version is very important for PySpark users. I 
am afraid there are breaking changes in Hadoop 3.x releases. 
   
   We should avoid making this change until we can resolve 
https://issues.apache.org/jira/browse/SPARK-32017



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager

2020-06-22 Thread GitBox


cloud-fan commented on a change in pull request #28895:
URL: https://github.com/apache/spark/pull/28895#discussion_r443999839



##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -737,35 +721,21 @@ private[spark] class MapOutputTrackerMaster(
   // Get blocks sizes by executor Id. Note that zero-sized blocks are excluded 
in the result.
   // This method is only called in local-mode.
   def getMapSizesByExecutorId(
-  shuffleId: Int,
-  startPartition: Int,
-  endPartition: Int)
-  : Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = {
-logDebug(s"Fetching outputs for shuffle $shuffleId, partitions 
$startPartition-$endPartition")
-shuffleStatuses.get(shuffleId) match {
-  case Some (shuffleStatus) =>
-shuffleStatus.withMapStatuses { statuses =>
-  MapOutputTracker.convertMapStatuses(
-shuffleId, startPartition, endPartition, statuses, 0, 
shuffleStatus.mapStatuses.length)
-}
-  case None =>
-Iterator.empty
-}
-  }
-
-  override def getMapSizesByRange(
   shuffleId: Int,
   startMapIndex: Int,
   endMapIndex: Int,
   startPartition: Int,
-  endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, 
Int)])] = {
-logDebug(s"Fetching outputs for shuffle $shuffleId, mappers 
$startMapIndex-$endMapIndex" +
-  s"partitions $startPartition-$endPartition")
+  endPartition: Int)
+: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = {

Review comment:
   unnecessary change





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager

2020-06-22 Thread GitBox


cloud-fan commented on a change in pull request #28895:
URL: https://github.com/apache/spark/pull/28895#discussion_r44333



##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -737,35 +721,21 @@ private[spark] class MapOutputTrackerMaster(
   // Get blocks sizes by executor Id. Note that zero-sized blocks are excluded 
in the result.
   // This method is only called in local-mode.
   def getMapSizesByExecutorId(
-  shuffleId: Int,
-  startPartition: Int,
-  endPartition: Int)
-  : Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = {
-logDebug(s"Fetching outputs for shuffle $shuffleId, partitions 
$startPartition-$endPartition")
-shuffleStatuses.get(shuffleId) match {
-  case Some (shuffleStatus) =>
-shuffleStatus.withMapStatuses { statuses =>
-  MapOutputTracker.convertMapStatuses(
-shuffleId, startPartition, endPartition, statuses, 0, 
shuffleStatus.mapStatuses.length)
-}
-  case None =>
-Iterator.empty
-}
-  }
-
-  override def getMapSizesByRange(
   shuffleId: Int,
   startMapIndex: Int,
   endMapIndex: Int,
   startPartition: Int,
-  endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, 
Int)])] = {
-logDebug(s"Fetching outputs for shuffle $shuffleId, mappers 
$startMapIndex-$endMapIndex" +
-  s"partitions $startPartition-$endPartition")
+  endPartition: Int)
+: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] = {
+logDebug(s"Fetching outputs for shuffle $shuffleId")
 shuffleStatuses.get(shuffleId) match {
-  case Some(shuffleStatus) =>
+  case Some (shuffleStatus) =>

Review comment:
   unnecessary change.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-647946391


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124394/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

2020-06-22 Thread GitBox


gatorsmile commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647946667


   Yes. As you said, the default version is very important for PySpark users. 
   
   We should avoid making this change until we can resolve 
https://issues.apache.org/jira/browse/SPARK-32017



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table

2020-06-22 Thread GitBox


SparkQA commented on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-647946372


   **[Test build #124394 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124394/testReport)**
 for PR 28901 at commit 
[`fa1a84a`](https://github.com/apache/spark/commit/fa1a84a8fd5964d91a8b43bcb75609af43553f61).
* This patch **fails build dependency tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager

2020-06-22 Thread GitBox


cloud-fan commented on a change in pull request #28895:
URL: https://github.com/apache/spark/pull/28895#discussion_r443999460



##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -335,28 +335,12 @@ private[spark] abstract class MapOutputTracker(conf: 
SparkConf) extends Logging
* tuples describing the shuffle blocks that are stored at that 
block manager.
*/
   def getMapSizesByExecutorId(
-  shuffleId: Int,
-  startPartition: Int,
-  endPartition: Int)
-  : Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])]
-
-  /**
-   * Called from executors to get the server URIs and output sizes for each 
shuffle block that
-   * needs to be read from a given range of map output partitions 
(startPartition is included but
-   * endPartition is excluded from the range) and is produced by
-   * a range of mappers (startMapIndex, endMapIndex, startMapIndex is included 
and
-   * the endMapIndex is excluded).
-   *
-   * @return A sequence of 2-item tuples, where the first item in the tuple is 
a BlockManagerId,
-   * and the second item is a sequence of (shuffle block id, shuffle 
block size, map index)
-   * tuples describing the shuffle blocks that are stored at that 
block manager.
-   */
-  def getMapSizesByRange(
   shuffleId: Int,
   startMapIndex: Int,
   endMapIndex: Int,
   startPartition: Int,
-  endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])]
+  endPartition: Int)
+  : Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])]

Review comment:
   unnecessary change





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table

2020-06-22 Thread GitBox


SparkQA removed a comment on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-647945315


   **[Test build #124394 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124394/testReport)**
 for PR 28901 at commit 
[`fa1a84a`](https://github.com/apache/spark/commit/fa1a84a8fd5964d91a8b43bcb75609af43553f61).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-647946382


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table

2020-06-22 Thread GitBox


AmplabJenkins commented on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-647946382







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-647945936







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan edited a comment on pull request #28780: [SPARK-31952][SQL]Fix incorrect memory spill metric when doing Aggregate

2020-06-22 Thread GitBox


cloud-fan edited a comment on pull request #28780:
URL: https://github.com/apache/spark/pull/28780#issuecomment-647945904


   shall we set `sorter.totalSpillBytes`? then we can update the metrics 
correctly in `sorter.spill`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-647945906







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28780: [SPARK-31952][SQL]Fix incorrect memory spill metric when doing Aggregate

2020-06-22 Thread GitBox


cloud-fan commented on pull request #28780:
URL: https://github.com/apache/spark/pull/28780#issuecomment-647945904


   shall we set `sorter.totalSpillBytes`, then we can update the metrics 
correctly in `sort.spill`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-06-22 Thread GitBox


AmplabJenkins commented on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-647945936







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan edited a comment on pull request #28780: [SPARK-31952][SQL]Fix incorrect memory spill metric when doing Aggregate

2020-06-22 Thread GitBox


cloud-fan edited a comment on pull request #28780:
URL: https://github.com/apache/spark/pull/28780#issuecomment-647945904


   shall we set `sorter.totalSpillBytes`? then we can update the metrics 
correctly in `sort.spill`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table

2020-06-22 Thread GitBox


AmplabJenkins commented on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-647945906







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-06-22 Thread GitBox


SparkQA commented on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-647945350


   **[Test build #124395 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124395/testReport)**
 for PR 27366 at commit 
[`38eb601`](https://github.com/apache/spark/commit/38eb601c6b6ea82015d80e0e8fd1e7030a8406dd).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table

2020-06-22 Thread GitBox


SparkQA commented on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-647945315


   **[Test build #124394 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124394/testReport)**
 for PR 28901 at commit 
[`fa1a84a`](https://github.com/apache/spark/commit/fa1a84a8fd5964d91a8b43bcb75609af43553f61).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28886: [SPARK-32043][SQL] Replace Decimal by Int op in `make_interval` and `make_timestamp`

2020-06-22 Thread GitBox


cloud-fan commented on a change in pull request #28886:
URL: https://github.com/apache/spark/pull/28886#discussion_r443997431



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
##
@@ -751,7 +751,8 @@ object IntervalUtils {
   secs: Decimal): CalendarInterval = {
 val totalMonths = Math.addExact(months, Math.multiplyExact(years, 
MONTHS_PER_YEAR))
 val totalDays = Math.addExact(days, Math.multiplyExact(weeks, 
DAYS_PER_WEEK))
-var micros = (secs * Decimal(MICROS_PER_SECOND)).toLong
+assert(secs.scale == 6, "Seconds fractional must have 6 digits for 
microseconds")

Review comment:
   shall we check the precision as well?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #28866: [SPARK-31845][CORE][TESTS] DAGSchedulerSuite: Reuse completeNextStageWithFetchFailure

2020-06-22 Thread GitBox


Ngone51 commented on pull request #28866:
URL: https://github.com/apache/spark/pull/28866#issuecomment-647943905


   LGTM, also cc @jiangxb1987 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

2020-06-22 Thread GitBox


HyukjinKwon commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647942729


   ^ I target to have a way to control it in Spark 3.1 FWIW at SPARK-32017



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28899:
URL: https://github.com/apache/spark/pull/28899#issuecomment-647942233


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124389/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28899:
URL: https://github.com/apache/spark/pull/28899#issuecomment-647942226


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-06-22 Thread GitBox


MaxGekk commented on a change in pull request #27366:
URL: https://github.com/apache/spark/pull/27366#discussion_r443994932



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmark.scala
##
@@ -508,6 +548,7 @@ object JsonBenchmark extends SqlBasedBenchmark {
   jsonInDS(50 * 1000 * 1000, numIters)
   jsonInFile(50 * 1000 * 1000, numIters)
   datetimeBenchmark(rowsNum = 10 * 1000 * 1000, numIters)
+  filtersPushdownBenchmark(rowsNum = 100 * 1000, numIters)

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession

2020-06-22 Thread GitBox


SparkQA removed a comment on pull request #28899:
URL: https://github.com/apache/spark/pull/28899#issuecomment-647909279


   **[Test build #124389 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124389/testReport)**
 for PR 28899 at commit 
[`a486f1f`](https://github.com/apache/spark/commit/a486f1fae0c733571275cad6dab981803d47cbe7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession

2020-06-22 Thread GitBox


AmplabJenkins commented on pull request #28899:
URL: https://github.com/apache/spark/pull/28899#issuecomment-647942226







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession

2020-06-22 Thread GitBox


SparkQA commented on pull request #28899:
URL: https://github.com/apache/spark/pull/28899#issuecomment-647942120


   **[Test build #124389 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124389/testReport)**
 for PR 28899 at commit 
[`a486f1f`](https://github.com/apache/spark/commit/a486f1fae0c733571275cad6dab981803d47cbe7).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28894: [SPARK-32052][SQL] Extract common code from date-time field expressions

2020-06-22 Thread GitBox


HyukjinKwon commented on pull request #28894:
URL: https://github.com/apache/spark/pull/28894#issuecomment-647941360


   late LGTM too



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] venkata91 commented on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark's bl

2020-06-22 Thread GitBox


venkata91 commented on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-647940759


   @tgravescs After thinking about the problem and also after discussing with 
@mridulm, I have handled this problem now by just keeping track of 
unschedulable task sets in order to add more executors when dynamic allocation 
is enabled. Now once some task becomes schedulable, we'll clear this set since 
some executor got free or we have just acquired a new executor and found a way 
to make progress. Let me know what do you think about this change. Thanks for 
taking a look previously and giving the overall context 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28895:
URL: https://github.com/apache/spark/pull/28895#issuecomment-647940129







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager

2020-06-22 Thread GitBox


AmplabJenkins commented on pull request #28895:
URL: https://github.com/apache/spark/pull/28895#issuecomment-647940129







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager

2020-06-22 Thread GitBox


SparkQA commented on pull request #28895:
URL: https://github.com/apache/spark/pull/28895#issuecomment-647939695


   **[Test build #124393 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124393/testReport)**
 for PR 28895 at commit 
[`2a31450`](https://github.com/apache/spark/commit/2a31450f341ac5d4fc46817dd65248b5d973c002).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-06-22 Thread GitBox


MaxGekk commented on a change in pull request #27366:
URL: https://github.com/apache/spark/pull/27366#discussion_r443991161



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmark.scala
##
@@ -495,6 +496,45 @@ object JsonBenchmark extends SqlBasedBenchmark {
 }
   }
 
+  private def filtersPushdownBenchmark(rowsNum: Int, numIters: Int): Unit = {
+val benchmark = new Benchmark(s"Filters pushdown", rowsNum, output = 
output)
+val colsNum = 100
+val fields = Seq.tabulate(colsNum)(i => StructField(s"col$i", 
TimestampType))
+val schema = StructType(StructField("key", IntegerType) +: fields)
+def columns(): Seq[Column] = {
+  val ts = Seq.tabulate(colsNum) { i =>
+lit(Instant.ofEpochSecond(i * 12345678)).as(s"col$i")
+  }
+  ($"id" % 1000).as("key") +: ts
+}
+withTempPath { path =>
+  spark.range(rowsNum).select(columns(): 
_*).write.json(path.getAbsolutePath)
+  def readback = {
+spark.read.schema(schema).json(path.getAbsolutePath)
+  }
+
+  benchmark.addCase(s"w/o filters", numIters) { _ =>
+readback.noop()
+  }
+
+  def withFilter(configEnabled: Boolean): Unit = {
+withSQLConf(SQLConf.JSON_FILTER_PUSHDOWN_ENABLED.key -> 
configEnabled.toString()) {
+  readback.filter($"key" === 0).noop()
+}
+  }
+
+  benchmark.addCase(s"pushdown disabled", numIters) { _ =>

Review comment:
   I will remove it here and in other places. Thanks





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28903: [SPARK-19939] [ML] Add support for association rules in ML

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28903:
URL: https://github.com/apache/spark/pull/28903#issuecomment-647937510







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-06-22 Thread GitBox


MaxGekk commented on a change in pull request #27366:
URL: https://github.com/apache/spark/pull/27366#discussion_r443990454



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/StructFilters.scala
##
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst
+
+import scala.util.Try
+
+import org.apache.spark.sql.catalyst.StructFilters._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.sources
+import org.apache.spark.sql.types.{BooleanType, StructType}
+
+/**
+ * The class provides API for applying pushed down filters to partially or
+ * fully set internal rows that have the struct schema.
+ *
+ * @param pushedFilters The pushed down source filters. The filters should 
refer to
+ *  the fields of the provided schema.
+ * @param schema The required schema of records from datasource files.
+ */
+abstract class StructFilters(pushedFilters: Seq[sources.Filter], schema: 
StructType) {
+
+  protected val filters = pushedFilters.filter(checkFilterRefs(_, 
schema.fieldNames.toSet))
+
+  /**
+   * Applies pushed down source filters to the given row assuming that
+   * value at `index` has been already set.
+   *
+   * @param row The row with fully or partially set values.
+   * @param index The index of already set value.
+   * @return true if currently processed row can be skipped otherwise false.
+   */
+  def skipRow(row: InternalRow, index: Int): Boolean
+
+  /**
+   * Resets states of pushed down filters. The method must be called before
+   * precessing any new row otherwise skipRow() may return wrong result.
+   */
+  def reset(): Unit
+
+  /**
+   * Compiles source filters to a predicate.
+   */
+  def toPredicate(filters: Seq[sources.Filter]): BasePredicate = {
+val reducedExpr = filters
+  .sortBy(_.references.length)
+  .flatMap(filterToExpression(_, toRef))
+  .reduce(And)
+Predicate.create(reducedExpr)
+  }
+
+  // Finds a filter attribute in the schema and converts it to a 
`BoundReference`
+  def toRef(attr: String): Option[BoundReference] = {
+schema.getFieldIndex(attr).map { index =>
+  val field = schema(index)
+  BoundReference(schema.fieldIndex(attr), field.dataType, field.nullable)

Review comment:
   Right, I will replace it by `index`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28903: [SPARK-19939] [ML] Add support for association rules in ML

2020-06-22 Thread GitBox


AmplabJenkins commented on pull request #28903:
URL: https://github.com/apache/spark/pull/28903#issuecomment-647937510







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #27246:
URL: https://github.com/apache/spark/pull/27246#issuecomment-647937018


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124383/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-22 Thread GitBox


SparkQA removed a comment on pull request #27246:
URL: https://github.com/apache/spark/pull/27246#issuecomment-647896228


   **[Test build #124383 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124383/testReport)**
 for PR 27246 at commit 
[`4c21ba6`](https://github.com/apache/spark/commit/4c21ba660fe4e992cbfaf33932abc3ce3587ebc4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28903: [SPARK-19939] [ML] Add support for association rules in ML

2020-06-22 Thread GitBox


SparkQA commented on pull request #28903:
URL: https://github.com/apache/spark/pull/28903#issuecomment-647937053


   **[Test build #124392 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124392/testReport)**
 for PR 28903 at commit 
[`1cc6560`](https://github.com/apache/spark/commit/1cc6560e7f72594e2d1bf6400c5391741d64dae0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-22 Thread GitBox


AmplabJenkins commented on pull request #27246:
URL: https://github.com/apache/spark/pull/27246#issuecomment-647937012







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #27246:
URL: https://github.com/apache/spark/pull/27246#issuecomment-647937012


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-22 Thread GitBox


SparkQA commented on pull request #27246:
URL: https://github.com/apache/spark/pull/27246#issuecomment-647936567


   **[Test build #124383 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124383/testReport)**
 for PR 27246 at commit 
[`4c21ba6`](https://github.com/apache/spark/commit/4c21ba660fe4e992cbfaf33932abc3ce3587ebc4).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao opened a new pull request #28903: [SPARK-19939] [ML] Add support for association rules in ML

2020-06-22 Thread GitBox


huaxingao opened a new pull request #28903:
URL: https://github.com/apache/spark/pull/28903


   
   ### What changes were proposed in this pull request?
   Adding support to Association Rules in Spark ml.fpm.
   
   ### Why are the changes needed?
   Support is an indication of how frequently the itemset of an association 
rule appears in the database and suggests if the rules are generally applicable 
to the dateset. Refer to 
[wiki](https://en.wikipedia.org/wiki/Association_rule_learning#Support) for 
more details.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. Associate Rules now have support measure
   
   
   ### How was this patch tested?
   existing and new unit test
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28900: [SPARK-32056][SQL] Coalesce partitions for repartition by expressions when AQE is enabled

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28900:
URL: https://github.com/apache/spark/pull/28900#issuecomment-647935005







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28900: [SPARK-32056][SQL] Coalesce partitions for repartition by expressions when AQE is enabled

2020-06-22 Thread GitBox


AmplabJenkins commented on pull request #28900:
URL: https://github.com/apache/spark/pull/28900#issuecomment-647935005







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28900: [SPARK-32056][SQL] Coalesce partitions for repartition by expressions when AQE is enabled

2020-06-22 Thread GitBox


SparkQA commented on pull request #28900:
URL: https://github.com/apache/spark/pull/28900#issuecomment-647934563


   **[Test build #124391 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124391/testReport)**
 for PR 28900 at commit 
[`8e39ed7`](https://github.com/apache/spark/commit/8e39ed7787c9f80591963de1e7ab4f0f2c24fda3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession

2020-06-22 Thread GitBox


AmplabJenkins commented on pull request #28899:
URL: https://github.com/apache/spark/pull/28899#issuecomment-647932324







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28899:
URL: https://github.com/apache/spark/pull/28899#issuecomment-647932324







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #28900: [SPARK-32056][SQL] Coalesce partitions for repartition by expressions when AQE is enabled

2020-06-22 Thread GitBox


viirya commented on a change in pull request #28900:
URL: https://github.com/apache/spark/pull/28900#discussion_r443984508



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##
@@ -1026,15 +1026,48 @@ class AdaptiveQueryExecSuite
 Seq(true, false).foreach { enableAQE =>
   withSQLConf(
 SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> enableAQE.toString,
+SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
 SQLConf.SHUFFLE_PARTITIONS.key -> "6",
 SQLConf.COALESCE_PARTITIONS_INITIAL_PARTITION_NUM.key -> "7") {
-val partitionsNum = 
spark.range(10).repartition($"id").rdd.collectPartitions().length
+val df = spark.range(10).repartition($"id")

Review comment:
   ok.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #28894: [SPARK-32052][SQL] Extract common code from date-time field expressions

2020-06-22 Thread GitBox


cloud-fan closed pull request #28894:
URL: https://github.com/apache/spark/pull/28894


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static

2020-06-22 Thread GitBox


beliefer commented on pull request #26875:
URL: https://github.com/apache/spark/pull/26875#issuecomment-647931974


   test1 looks the same as test3.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession

2020-06-22 Thread GitBox


SparkQA commented on pull request #28899:
URL: https://github.com/apache/spark/pull/28899#issuecomment-647931949


   **[Test build #124390 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124390/testReport)**
 for PR 28899 at commit 
[`708c9ff`](https://github.com/apache/spark/commit/708c9ff27ac0f234fbac0d4d6524adb90c9cf0b3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28884: [SPARK-20249][ML][PYSPARK] Add training summary for LinearSVCModel

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28884:
URL: https://github.com/apache/spark/pull/28884#issuecomment-647927719







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28884: [SPARK-20249][ML][PYSPARK] Add training summary for LinearSVCModel

2020-06-22 Thread GitBox


SparkQA removed a comment on pull request #28884:
URL: https://github.com/apache/spark/pull/28884#issuecomment-647905327


   **[Test build #124386 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124386/testReport)**
 for PR 28884 at commit 
[`423eeb5`](https://github.com/apache/spark/commit/423eeb502a1ccdba1fc774c7ba3d2057bf38207b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28894: [SPARK-32052][SQL] Extract common code from date-time field expressions

2020-06-22 Thread GitBox


cloud-fan commented on pull request #28894:
URL: https://github.com/apache/spark/pull/28894#issuecomment-647931165


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a change in pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession

2020-06-22 Thread GitBox


ulysses-you commented on a change in pull request #28899:
URL: https://github.com/apache/spark/pull/28899#discussion_r443982907



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala
##
@@ -240,4 +240,20 @@ class SparkSessionBuilderSuite extends SparkFunSuite with 
BeforeAndAfterEach {
 assert(session.conf.get(GLOBAL_TEMP_DATABASE) === 
"globaltempdb-spark-31532-2")
 assert(session.conf.get(WAREHOUSE_PATH) === "SPARK-31532-db-2")
   }
+
+  test("SPARK-32062: reset listenerRegistered in SparkSession") {
+(1 to 2).foreach { i =>
+  val conf = new SparkConf()
+.setMaster("local")
+.setAppName(s"test-SPARK-32062-$i")
+  val context = new SparkContext(conf)

Review comment:
   missed it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28856: [SPARK-31982][SQL]Function sequence doesn't handle date increments that cross DST

2020-06-22 Thread GitBox


cloud-fan commented on a change in pull request #28856:
URL: https://github.com/apache/spark/pull/28856#discussion_r443981984



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##
@@ -2589,6 +2589,8 @@ object Sequence {
 }
   }
 
+  // To generate time sequences, we use scale 1 in TemporalSequenceImpl
+  // for `TimestampType`, while MICROS_PER_DAY for `DateType`

Review comment:
   if start/end is date, can the step by seconds/minutes/hours?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession

2020-06-22 Thread GitBox


cloud-fan commented on a change in pull request #28899:
URL: https://github.com/apache/spark/pull/28899#discussion_r443980530



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala
##
@@ -240,4 +240,20 @@ class SparkSessionBuilderSuite extends SparkFunSuite with 
BeforeAndAfterEach {
 assert(session.conf.get(GLOBAL_TEMP_DATABASE) === 
"globaltempdb-spark-31532-2")
 assert(session.conf.get(WAREHOUSE_PATH) === "SPARK-31532-db-2")
   }
+
+  test("SPARK-32062: reset listenerRegistered in SparkSession") {
+(1 to 2).foreach { i =>
+  val conf = new SparkConf()
+.setMaster("local")
+.setAppName(s"test-SPARK-32062-$i")
+  val context = new SparkContext(conf)

Review comment:
   does this work? The test doesn't stop the spark context, and AFAIK we 
don't support having multiple spark context instance at the same time.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28884: [SPARK-20249][ML][PYSPARK] Add training summary for LinearSVCModel

2020-06-22 Thread GitBox


AmplabJenkins commented on pull request #28884:
URL: https://github.com/apache/spark/pull/28884#issuecomment-647927719







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28884: [SPARK-20249][ML][PYSPARK] Add training summary for LinearSVCModel

2020-06-22 Thread GitBox


SparkQA commented on pull request #28884:
URL: https://github.com/apache/spark/pull/28884#issuecomment-647927281


   **[Test build #124386 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124386/testReport)**
 for PR 28884 at commit 
[`423eeb5`](https://github.com/apache/spark/commit/423eeb502a1ccdba1fc774c7ba3d2057bf38207b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28900: [SPARK-32056][SQL] Coalesce partitions for repartition by expressions when AQE is enabled

2020-06-22 Thread GitBox


cloud-fan commented on a change in pull request #28900:
URL: https://github.com/apache/spark/pull/28900#discussion_r443979185



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##
@@ -1026,15 +1026,48 @@ class AdaptiveQueryExecSuite
 Seq(true, false).foreach { enableAQE =>
   withSQLConf(
 SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> enableAQE.toString,
+SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
 SQLConf.SHUFFLE_PARTITIONS.key -> "6",
 SQLConf.COALESCE_PARTITIONS_INITIAL_PARTITION_NUM.key -> "7") {
-val partitionsNum = 
spark.range(10).repartition($"id").rdd.collectPartitions().length
+val df = spark.range(10).repartition($"id")

Review comment:
   can we test `repartition(numPartitions)` in this test case and make sure 
the partition number doesn't change? You new test case already test repartition 
by key/range.

##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##
@@ -1026,15 +1026,48 @@ class AdaptiveQueryExecSuite
 Seq(true, false).foreach { enableAQE =>
   withSQLConf(
 SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> enableAQE.toString,
+SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
 SQLConf.SHUFFLE_PARTITIONS.key -> "6",
 SQLConf.COALESCE_PARTITIONS_INITIAL_PARTITION_NUM.key -> "7") {
-val partitionsNum = 
spark.range(10).repartition($"id").rdd.collectPartitions().length
+val df = spark.range(10).repartition($"id")

Review comment:
   can we test `repartition(numPartitions)` in this test case and make sure 
the partition number doesn't change? Your new test case already test 
repartition by key/range.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #28892: [MINOR][SQL] Simplify DateTimeUtils.cleanLegacyTimestampStr

2020-06-22 Thread GitBox


cloud-fan closed pull request #28892:
URL: https://github.com/apache/spark/pull/28892


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28892: [MINOR][SQL] Simplify DateTimeUtils.cleanLegacyTimestampStr

2020-06-22 Thread GitBox


cloud-fan commented on pull request #28892:
URL: https://github.com/apache/spark/pull/28892#issuecomment-647924824


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28895:
URL: https://github.com/apache/spark/pull/28895#issuecomment-647924065







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager

2020-06-22 Thread GitBox


AmplabJenkins commented on pull request #28895:
URL: https://github.com/apache/spark/pull/28895#issuecomment-647924065







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager

2020-06-22 Thread GitBox


SparkQA removed a comment on pull request #28895:
URL: https://github.com/apache/spark/pull/28895#issuecomment-647880850


   **[Test build #124380 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124380/testReport)**
 for PR 28895 at commit 
[`05fe4c7`](https://github.com/apache/spark/commit/05fe4c7be3a7b55d7a04e461e9c79c92031e627c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager

2020-06-22 Thread GitBox


SparkQA commented on pull request #28895:
URL: https://github.com/apache/spark/pull/28895#issuecomment-647923313


   **[Test build #124380 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124380/testReport)**
 for PR 28895 at commit 
[`05fe4c7`](https://github.com/apache/spark/commit/05fe4c7be3a7b55d7a04e461e9c79c92031e627c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] guoqiaoli1992 commented on pull request #26873: [SPARK-30240][core] Support HTTP redirects directly to a proxy server.

2020-06-22 Thread GitBox


guoqiaoli1992 commented on pull request #26873:
URL: https://github.com/apache/spark/pull/26873#issuecomment-647921920


   I'm facing a similar issue(I think), can I solve it with 
spark.ui.proxyRedirectUri?
   
https://stackoverflow.com/questions/62527979/running-history-server-behind-reverse-proxy



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #28892: [MINOR][SQL] Simplify DateTimeUtils.cleanLegacyTimestampStr

2020-06-22 Thread GitBox


MaxGekk commented on a change in pull request #28892:
URL: https://github.com/apache/spark/pull/28892#discussion_r443973612



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##
@@ -203,20 +203,10 @@ object DateTimeUtils {
 Math.multiplyExact(millis, MICROS_PER_MILLIS)
   }
 
+  private final val gmtUtf8 = UTF8String.fromString("GMT")
   // The method is called by JSON/CSV parser to clean up the legacy timestamp 
string by removing
-  // the "GMT" string.
-  def cleanLegacyTimestampStr(s: String): String = {
-val indexOfGMT = s.indexOf("GMT")
-if (indexOfGMT != -1) {
-  // ISO8601 with a weird time zone specifier (2000-01-01T00:00GMT+01:00)
-  val s0 = s.substring(0, indexOfGMT)
-  val s1 = s.substring(indexOfGMT + 3)
-  // Mapped to 2000-01-01T00:00+01:00
-  s0 + s1
-} else {
-  s
-}
-  }
+  // the "GMT" string. For example, it returns 2000-01-01T00:00+01:00 for 
2000-01-01T00:00GMT+01:00.
+  def cleanLegacyTimestampStr(s: UTF8String): UTF8String = s.replace(gmtUtf8, 
UTF8String.EMPTY_UTF8)

Review comment:
   It has but look at how it is implemented via regexp. @JoshRosen 
implemented more effective replace in UTF8String 
https://github.com/apache/spark/pull/24707. That's why I took it. I hope it 
seems reasonable.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kdzhao commented on a change in pull request #28731: [SPARK-31909][SQL] Add SPARK_SUBMIT_OPTS to Beeline Script

2020-06-22 Thread GitBox


kdzhao commented on a change in pull request #28731:
URL: https://github.com/apache/spark/pull/28731#discussion_r443874572



##
File path: bin/beeline
##
@@ -28,5 +28,7 @@ if [ -z "${SPARK_HOME}" ]; then
   source "$(dirname "$0")"/find-spark-home
 fi
 
+. "${SPARK_HOME}"/bin/load-spark-env.sh
+
 CLASS="org.apache.hive.beeline.BeeLine"
-exec "${SPARK_HOME}/bin/spark-class" $CLASS "$@"
+exec "${SPARK_HOME}/bin/spark-class" $SPARK_SUBMIT_OPTS $CLASS "$@"

Review comment:
   I think I misspoke for it. What I want to say is, in hive, looks like 
its beeline command is just call to hive with different parameters:
   https://github.com/apache/hive/blob/branch-1.2/bin/beeline
   https://github.com/apache/hive/blob/branch-1.2/bin/hive
   Agree that hive's beeline doesn't read spark parameter, and I would assume 
it reads its own (I saw "HADOOP_CLIENT_OPTS" etc in above script).
   Now back to spark, agree with you that fixing it in spark-class might cover 
more cases. On another side, so far only the beeline has this issue, so an easy 
fix on the beeline script also makes sense as a stopgap.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on a change in pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is

2020-06-22 Thread GitBox


attilapiros commented on a change in pull request #28848:
URL: https://github.com/apache/spark/pull/28848#discussion_r443970197



##
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##
@@ -1939,24 +1941,24 @@ private[spark] class DAGScheduler(
   hostToUnregisterOutputs: Option[String],
   maybeEpoch: Option[Long] = None): Unit = {
 val currentEpoch = maybeEpoch.getOrElse(mapOutputTracker.getEpoch)
+logDebug(s"Considering removal of executor $execId; " +
+  s"fileLost: $fileLost, currentEpoch: $currentEpoch")
 if (!failedEpoch.contains(execId) || failedEpoch(execId) < currentEpoch) {
   failedEpoch(execId) = currentEpoch
-  logInfo("Executor lost: %s (epoch %d)".format(execId, currentEpoch))
+  logInfo(s"Executor lost: $execId (epoch $currentEpoch)")

Review comment:
   Guilty as charged. My thinking was the following: as the string 
interpolation preferred in the project and the change itself is quite tiny and 
has zero risk and we are here and touching these lines it would be good to do 
it now (aka Boy Scout Rule).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

2020-06-22 Thread GitBox


dongjoon-hyun edited a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647912232


   BTW, please note that the default version is very important. For example, 
PySpark is downloaded 1,333,883 times last week, but we provides them only 
Spark with `Hadoop 2.7.4`.
   - https://pypistats.org/packages/pyspark



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

2020-06-22 Thread GitBox


dongjoon-hyun edited a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647912232


   BTW, please note that the default version is very important. For example, 
PySpark is downloaded 1,333,883 times last week, but it's only Spark 
distribution with `Hadoop 2.7.4`.
   - https://pypistats.org/packages/pyspark



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

2020-06-22 Thread GitBox


dongjoon-hyun commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647912232


   BTW, please note that the default version is very important. For example, 
PySpark is downloaded 1,333,883 times, but we provides them only Spark with 
`Hadoop 2.7.4`.
   - https://pypistats.org/packages/pyspark



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28899:
URL: https://github.com/apache/spark/pull/28899#issuecomment-647909575







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession

2020-06-22 Thread GitBox


AmplabJenkins commented on pull request #28899:
URL: https://github.com/apache/spark/pull/28899#issuecomment-647909575







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28899: [SPARK-32062][SQL] Reset listenerRegistered in SparkSession

2020-06-22 Thread GitBox


SparkQA commented on pull request #28899:
URL: https://github.com/apache/spark/pull/28899#issuecomment-647909279


   **[Test build #124389 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124389/testReport)**
 for PR 28899 at commit 
[`a486f1f`](https://github.com/apache/spark/commit/a486f1fae0c733571275cad6dab981803d47cbe7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager

2020-06-22 Thread GitBox


Ngone51 commented on a change in pull request #28895:
URL: https://github.com/apache/spark/pull/28895#discussion_r443962431



##
File path: core/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala
##
@@ -43,26 +44,16 @@ private[spark] trait ShuffleManager {
   context: TaskContext,
   metrics: ShuffleWriteMetricsReporter): ShuffleWriter[K, V]
 
-  /**
-   * Get a reader for a range of reduce partitions (startPartition to 
endPartition-1, inclusive).
-   * Called on executors by reduce tasks.
-   */
-  def getReader[K, C](
-  handle: ShuffleHandle,
-  startPartition: Int,
-  endPartition: Int,
-  context: TaskContext,
-  metrics: ShuffleReadMetricsReporter): ShuffleReader[K, C]
-
   /**
* Get a reader for a range of reduce partitions (startPartition to 
endPartition-1, inclusive) to
-   * read from map output (startMapIndex to endMapIndex - 1, inclusive).
+   * read from a range of map outputs which specified by 
mapIndexRange(startMapIndex to
+   * endMapIndex-1, inclusive).
+   *
* Called on executors by reduce tasks.
*/
-  def getReaderForRange[K, C](
+  def getReader[K, C](
   handle: ShuffleHandle,
-  startMapIndex: Int,
-  endMapIndex: Int,
+  mapIndexRange: Array[MapStatus] => (Int, Int),

Review comment:
   Ok, I get it how to know the `endMapIndex`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is l

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-647907550







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-06-22 Thread GitBox


AmplabJenkins commented on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-647907550







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28900: [SPARK-32056][SQL] Coalesce partitions for repartition by expressions when AQE is enabled

2020-06-22 Thread GitBox


SparkQA commented on pull request #28900:
URL: https://github.com/apache/spark/pull/28900#issuecomment-647907215


   **[Test build #124387 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124387/testReport)**
 for PR 28900 at commit 
[`43c4726`](https://github.com/apache/spark/commit/43c4726fa8b0de623d5563720c96632193262ec2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-06-22 Thread GitBox


SparkQA commented on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-647907228


   **[Test build #124388 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124388/testReport)**
 for PR 28848 at commit 
[`633a0e7`](https://github.com/apache/spark/commit/633a0e7841c9a9f9175bed510120ef2fe66ebe00).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28866: [SPARK-31845][CORE][TESTS] DAGSchedulerSuite: Reuse completeNextStageWithFetchFailure

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28866:
URL: https://github.com/apache/spark/pull/28866#issuecomment-647905965







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-06-22 Thread GitBox


wypoon commented on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-647906200


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28892: [MINOR][SQL] Simplify DateTimeUtils.cleanLegacyTimestampStr

2020-06-22 Thread GitBox


cloud-fan commented on a change in pull request #28892:
URL: https://github.com/apache/spark/pull/28892#discussion_r443959870



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##
@@ -203,20 +203,10 @@ object DateTimeUtils {
 Math.multiplyExact(millis, MICROS_PER_MILLIS)
   }
 
+  private final val gmtUtf8 = UTF8String.fromString("GMT")
   // The method is called by JSON/CSV parser to clean up the legacy timestamp 
string by removing
-  // the "GMT" string.
-  def cleanLegacyTimestampStr(s: String): String = {
-val indexOfGMT = s.indexOf("GMT")
-if (indexOfGMT != -1) {
-  // ISO8601 with a weird time zone specifier (2000-01-01T00:00GMT+01:00)
-  val s0 = s.substring(0, indexOfGMT)
-  val s1 = s.substring(indexOfGMT + 3)
-  // Mapped to 2000-01-01T00:00+01:00
-  s0 + s1
-} else {
-  s
-}
-  }
+  // the "GMT" string. For example, it returns 2000-01-01T00:00+01:00 for 
2000-01-01T00:00GMT+01:00.
+  def cleanLegacyTimestampStr(s: UTF8String): UTF8String = s.replace(gmtUtf8, 
UTF8String.EMPTY_UTF8)

Review comment:
   doesn't the `java.lang.String` have the `replace` method?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28884: [SPARK-20249][ML][PYSPARK] Add training summary for LinearSVCModel

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28884:
URL: https://github.com/apache/spark/pull/28884#issuecomment-647905639







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28900: [SPARK-32056][SQL] Coalesce partitions for repartition by expressions when AQE is enabled

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28900:
URL: https://github.com/apache/spark/pull/28900#issuecomment-647905591


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/29005/
   Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28866: [SPARK-31845][CORE][TESTS] DAGSchedulerSuite: Reuse completeNextStageWithFetchFailure

2020-06-22 Thread GitBox


AmplabJenkins commented on pull request #28866:
URL: https://github.com/apache/spark/pull/28866#issuecomment-647905965







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28884: [SPARK-20249][ML][PYSPARK] Add training summary for LinearSVCModel

2020-06-22 Thread GitBox


AmplabJenkins commented on pull request #28884:
URL: https://github.com/apache/spark/pull/28884#issuecomment-647905639







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28900: [SPARK-32056][SQL] Coalesce partitions for repartition by expressions when AQE is enabled

2020-06-22 Thread GitBox


AmplabJenkins removed a comment on pull request #28900:
URL: https://github.com/apache/spark/pull/28900#issuecomment-647905583


   Merged build finished. Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >