date:20210609

[GitHub] [spark] SparkQA commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES

2021-06-09 Thread GitBox



SparkQA commented on pull request #32852:
URL: https://github.com/apache/spark/pull/32852#issuecomment-858328096


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44138/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32470: [WIP] Simplify ResolveAggregateFunctions

2021-06-09 Thread GitBox



SparkQA commented on pull request #32470:
URL: https://github.com/apache/spark/pull/32470#issuecomment-858326389


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44137/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #32849: [WIP][SPARK-35704][SQL] Support fields by the day-time interval type

2021-06-09 Thread GitBox



MaxGekk commented on a change in pull request #32849:
URL: https://github.com/apache/spark/pull/32849#discussion_r648866947



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -2513,7 +2514,8 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
   }
 
   override def visitDayTimeIntervalDataType(ctx: 
DayTimeIntervalDataTypeContext): DataType = {
-DayTimeIntervalType
+// TODO(SPARK-X): Support day-time interval fields in SQL

Review comment:
   @cloud-fan Need to modify the parser rules. I plan to do that 
separately, and focus only on adding unit fields to the class 
`DayTimeIntervalType`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #32849: [WIP][SPARK-35704][SQL] Support fields by the day-time interval type

2021-06-09 Thread GitBox



MaxGekk commented on a change in pull request #32849:
URL: https://github.com/apache/spark/pull/32849#discussion_r648866947



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -2513,7 +2514,8 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
   }
 
   override def visitDayTimeIntervalDataType(ctx: 
DayTimeIntervalDataTypeContext): DataType = {
-DayTimeIntervalType
+// TODO(SPARK-X): Support day-time interval fields in SQL

Review comment:
   Need to modify the parser rules. I plan to do that separately, and focus 
only on adding unit fields to the class `DayTimeIntervalType`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

2021-06-09 Thread GitBox



AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322341


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44144/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #32849: [WIP][SPARK-35704][SQL] Support fields by the day-time interval type

2021-06-09 Thread GitBox



MaxGekk commented on a change in pull request #32849:
URL: https://github.com/apache/spark/pull/32849#discussion_r648866464



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -2358,7 +2358,8 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
   } else {
 assert(calendarInterval.months == 0)
 val micros = IntervalUtils.getDuration(calendarInterval, 
TimeUnit.MICROSECONDS)
-Literal(micros, DayTimeIntervalType)
+// TODO(SPARK-X): Parse to tightest day-time interval type

Review comment:
   @cloud-fan Here, we have already lost info about field units in 
CalendarInterval. I will open separate JIRA to refactor/implement this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

2021-06-09 Thread GitBox



AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322341


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44144/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

2021-06-09 Thread GitBox



SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322328


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44144/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

2021-06-09 Thread GitBox



AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322078


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139617/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

2021-06-09 Thread GitBox



SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858316750


   **[Test build #139617 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139617/testReport)**
 for PR 32084 at commit 
[`e34004f`](https://github.com/apache/spark/commit/e34004f8d42dcf7ecd0085f8ad062bffcad445d2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

2021-06-09 Thread GitBox



AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322078


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139617/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

2021-06-09 Thread GitBox



SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322051


   **[Test build #139617 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139617/testReport)**
 for PR 32084 at commit 
[`e34004f`](https://github.com/apache/spark/commit/e34004f8d42dcf7ecd0085f8ad062bffcad445d2).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #32849: [WIP][SPARK-35704][SQL] Support fields by the day-time interval type

2021-06-09 Thread GitBox



MaxGekk commented on a change in pull request #32849:
URL: https://github.com/apache/spark/pull/32849#discussion_r648865489



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
##
@@ -21,10 +21,7 @@
 import java.math.BigDecimal;
 import java.math.BigInteger;
 import java.nio.ByteBuffer;
-import java.util.Arrays;
-import java.util.Collections;
-import java.util.HashSet;
-import java.util.Set;
+import java.util.*;

Review comment:
   I will revert this back.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #32849: [WIP][SPARK-35704][SQL] Support fields by the day-time interval type

2021-06-09 Thread GitBox



MaxGekk commented on a change in pull request #32849:
URL: https://github.com/apache/spark/pull/32849#discussion_r648864959



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
##
@@ -1152,7 +1154,8 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
 (c, evPrim, evNull) => {
   code"$evPrim = 
UTF8String.fromString($udtRef.deserialize($c).toString());"
 }
-  case i @ (YearMonthIntervalType | DayTimeIntervalType) =>
+  // TODO(SPARK-X): Take into account day-time interval fields in cast

Review comment:
   @cloud-fan I have marked the placed where need some follow ups. I am 
going to create sub-tasks in JIRA.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-06-09 Thread GitBox



SparkQA removed a comment on pull request #32816:
URL: https://github.com/apache/spark/pull/32816#issuecomment-858252115


   **[Test build #139608 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139608/testReport)**
 for PR 32816 at commit 
[`a80bb5c`](https://github.com/apache/spark/commit/a80bb5c06d76f35a80afcad6e242158ef809875e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-06-09 Thread GitBox



AmplabJenkins removed a comment on pull request #32816:
URL: https://github.com/apache/spark/pull/32816#issuecomment-858319082


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139608/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-06-09 Thread GitBox



AmplabJenkins commented on pull request #32816:
URL: https://github.com/apache/spark/pull/32816#issuecomment-858319082


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139608/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-06-09 Thread GitBox



SparkQA commented on pull request #32816:
URL: https://github.com/apache/spark/pull/32816#issuecomment-858318818


   **[Test build #139608 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139608/testReport)**
 for PR 32816 at commit 
[`a80bb5c`](https://github.com/apache/spark/commit/a80bb5c06d76f35a80afcad6e242158ef809875e).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `public final class CompositeReadLimit implements ReadLimit `
 * `public final class ReadMinRows implements ReadLimit `
 * `trait InvokeLike extends Expression with NonSQLExpression with 
ImplicitCastInputTypes `
 * `case class LateralSubquery(`
 * `case class LateralJoin(`
 * `case class CommandResultExec(`
 * `class RocksDBFileManager(`
 * `  sealed trait SchemaReader `
 * `  class SchemaV1Reader extends SchemaReader `
 * `  class SchemaV2Reader extends SchemaReader `
 * `  trait SchemaWriter `
 * `  class SchemaV1Writer extends SchemaWriter `
 * `  class SchemaV2Writer extends SchemaWriter `
 * `case class CommandResult(`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ueshin closed pull request #32738: [SPARK-35474] Enable disallow_untyped_defs mypy check for pyspark.pandas.indexing.

2021-06-09 Thread GitBox



ueshin closed pull request #32738:
URL: https://github.com/apache/spark/pull/32738


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ueshin commented on pull request #32738: [SPARK-35474] Enable disallow_untyped_defs mypy check for pyspark.pandas.indexing.

2021-06-09 Thread GitBox



ueshin commented on pull request #32738:
URL: https://github.com/apache/spark/pull/32738#issuecomment-858317725


   Thanks! merging to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tanelk commented on pull request #31677: [SPARK-34565][SQL] Collapse Window nodes with Project between them

2021-06-09 Thread GitBox



tanelk commented on pull request #31677:
URL: https://github.com/apache/spark/pull/31677#issuecomment-858317298


   This and a very similar PR #31980 have been approved for a while now. 
@maropu , could you take another look, maybe we can merge this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps

2021-06-09 Thread GitBox



AmplabJenkins removed a comment on pull request #32821:
URL: https://github.com/apache/spark/pull/32821#issuecomment-858316652


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139616/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps

2021-06-09 Thread GitBox



SparkQA removed a comment on pull request #32821:
URL: https://github.com/apache/spark/pull/32821#issuecomment-858316242


   **[Test build #139616 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139616/testReport)**
 for PR 32821 at commit 
[`e3b1440`](https://github.com/apache/spark/commit/e3b1440e3a0523e8007b927c84851d6609496501).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

2021-06-09 Thread GitBox



SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858316750


   **[Test build #139617 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139617/testReport)**
 for PR 32084 at commit 
[`e34004f`](https://github.com/apache/spark/commit/e34004f8d42dcf7ecd0085f8ad062bffcad445d2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps

2021-06-09 Thread GitBox



AmplabJenkins commented on pull request #32821:
URL: https://github.com/apache/spark/pull/32821#issuecomment-858316652


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139616/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps

2021-06-09 Thread GitBox



SparkQA commented on pull request #32821:
URL: https://github.com/apache/spark/pull/32821#issuecomment-858316631


   **[Test build #139616 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139616/testReport)**
 for PR 32821 at commit 
[`e3b1440`](https://github.com/apache/spark/commit/e3b1440e3a0523e8007b927c84851d6609496501).
* This patch **fails RAT tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32841: [SPARK-35673][SQL] Fix user-defined hint and unrecognized hint in subquery.

2021-06-09 Thread GitBox



AmplabJenkins removed a comment on pull request #32841:
URL: https://github.com/apache/spark/pull/32841#issuecomment-857578831


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32841: [SPARK-35673][SQL] Fix user-defined hint and unrecognized hint in subquery.

2021-06-09 Thread GitBox



SparkQA commented on pull request #32841:
URL: https://github.com/apache/spark/pull/32841#issuecomment-858316254


   **[Test build #139615 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139615/testReport)**
 for PR 32841 at commit 
[`94d22b2`](https://github.com/apache/spark/commit/94d22b2c519f3f15af853a249218ed261640136e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps

2021-06-09 Thread GitBox



SparkQA commented on pull request #32821:
URL: https://github.com/apache/spark/pull/32821#issuecomment-858316242


   **[Test build #139616 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139616/testReport)**
 for PR 32821 at commit 
[`e3b1440`](https://github.com/apache/spark/commit/e3b1440e3a0523e8007b927c84851d6609496501).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32853: [SPARK-35683][PYTHON] Fix Index.difference to avoid collect 'other' to driver side

2021-06-09 Thread GitBox



SparkQA commented on pull request #32853:
URL: https://github.com/apache/spark/pull/32853#issuecomment-858316127


   **[Test build #139614 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139614/testReport)**
 for PR 32853 at commit 
[`61a5c92`](https://github.com/apache/spark/commit/61a5c92ab303dd6bb6aa65e24abb644255e4fa15).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

2021-06-09 Thread GitBox



SparkQA commented on pull request #32776:
URL: https://github.com/apache/spark/pull/32776#issuecomment-858315767


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44140/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] otterc commented on a change in pull request #32140: [WIP][SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data

2021-06-09 Thread GitBox



otterc commented on a change in pull request #32140:
URL: https://github.com/apache/spark/pull/32140#discussion_r648845832



##
File path: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
##
@@ -1124,4 +1392,298 @@ object ShuffleBlockFetcherIterator {
*/
   private[storage]
   case class DeferFetchRequestResult(fetchRequest: FetchRequest) extends 
FetchResult
+
+  /**
+   * Result of a fetch from a remote merged block unsuccessfully.
+   * Instead of treating this as a FailureFetchResult, we ignore this failure
+   * and fallback to fetch the original unmerged blocks.
+   * @param blockId block id
+   * @param address BlockManager that the merged block was attempted to be 
fetched from
+   * @param size size of the block, used to update bytesInFlight.
+   * @param isNetworkReqDone Is this the last network request for this host in 
this fetch
+   * request. Used to update reqsInFlight.
+   */
+  private[storage] case class IgnoreFetchResult(blockId: BlockId,
+  address: BlockManagerId,
+  size: Long,
+  isNetworkReqDone: Boolean) extends FetchResult
+
+  /**
+   * Result of a successful fetch of meta information for a merged block.
+   *
+   * @param shuffleIdshuffle id.
+   * @param reduceId reduce id.
+   * @param blockSizesize of each merged block.
+   * @param numChunksnumber of chunks in the merged block.
+   * @param bitmaps  bitmaps for every chunk.
+   * @param address  BlockManager that the merged status was fetched 
from.
+   */
+  private[storage] case class MergedBlocksMetaFetchResult(
+  shuffleId: Int,
+  reduceId: Int,
+  blockSize: Long,
+  numChunks: Int,
+  bitmaps: Array[RoaringBitmap],
+  address: BlockManagerId,
+  blockId: BlockId = DUMMY_SHUFFLE_BLOCK_ID) extends FetchResult
+
+  /**
+   * Result of a failure while fetching the meta information for a merged 
block.
+   *
+   * @param shuffleId shuffle id.
+   * @param reduceId  reduce id.
+   * @param address   BlockManager that the merged status was fetched from.
+   */
+  private[storage] case class MergedBlocksMetaFailedFetchResult(
+  shuffleId: Int,
+  reduceId: Int,
+  address: BlockManagerId,
+  blockId: BlockId = DUMMY_SHUFFLE_BLOCK_ID) extends FetchResult
+}
+
+/**
+ * Helper class that encapsulates all the push-based functionality to fetch 
merged block meta
+ * and merged shuffle block chunks.
+ */
+private class PushBasedFetchHelper(

Review comment:
   A  lot of methods in `PushBasedFetchHelper` also needs access to the 
iterator instance.  It needs to work with the iterator to be able to:
   1. add results to the iterator's `result` queue when it receives the meta 
response.
   2. updates number of blocks to fetch.
   3. fetch fallback blocks when there is a fallback and this in turn removes 
some pending blocks from `fetchRequests`.
   
   It also needs access to the `shuffleClient`, `blockManager`, and 
`mapOutputTracker`. Most of the methods in this class will access one or more 
of these instances.
   
   IMO, it seem better to create an instance of `PushBasedFetchHelper` per 
iterator instance. Otherwise, all the methods of `PushBasedFetchHelper` will 
have way more arguments.
   
   I find this class similar to  the existing `BufferReleasingInputStream` in 
the iterator.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-06-09 Thread GitBox



AmplabJenkins removed a comment on pull request #32816:
URL: https://github.com/apache/spark/pull/32816#issuecomment-858314560


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44135/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32470: [WIP] Simplify ResolveAggregateFunctions

2021-06-09 Thread GitBox



AmplabJenkins removed a comment on pull request #32470:
URL: https://github.com/apache/spark/pull/32470#issuecomment-858314556


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44136/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps

2021-06-09 Thread GitBox



AmplabJenkins removed a comment on pull request #32821:
URL: https://github.com/apache/spark/pull/32821#issuecomment-858314559


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44139/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES

2021-06-09 Thread GitBox



AmplabJenkins removed a comment on pull request #32852:
URL: https://github.com/apache/spark/pull/32852#issuecomment-858314558


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44134/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32470: [WIP] Simplify ResolveAggregateFunctions

2021-06-09 Thread GitBox



AmplabJenkins commented on pull request #32470:
URL: https://github.com/apache/spark/pull/32470#issuecomment-858314556


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44136/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES

2021-06-09 Thread GitBox



AmplabJenkins commented on pull request #32852:
URL: https://github.com/apache/spark/pull/32852#issuecomment-858314558


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44134/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps

2021-06-09 Thread GitBox



AmplabJenkins commented on pull request #32821:
URL: https://github.com/apache/spark/pull/32821#issuecomment-858314559


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44139/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-06-09 Thread GitBox



AmplabJenkins commented on pull request #32816:
URL: https://github.com/apache/spark/pull/32816#issuecomment-858314560


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44135/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #32140: [WIP][SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data

2021-06-09 Thread GitBox



Ngone51 commented on a change in pull request #32140:
URL: https://github.com/apache/spark/pull/32140#discussion_r648857212



##
File path: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
##
@@ -767,6 +908,43 @@ final class ShuffleBlockFetcherIterator(
 deferredFetchRequests.getOrElseUpdate(address, new 
Queue[FetchRequest]())
   defReqQueue.enqueue(request)
   result = null
+
+case IgnoreFetchResult(blockId, address, size, isNetworkReqDone) =>
+  if (pushBasedFetchHelper.isNotExecutorOrMergedLocal(address)) {
+numBlocksInFlightPerAddress(address) = 
numBlocksInFlightPerAddress(address) - 1
+bytesInFlight -= size
+  }
+  if (isNetworkReqDone) {
+reqsInFlight -= 1
+logDebug("Number of requests in flight " + reqsInFlight)
+  }
+  numBlocksProcessed += 
pushBasedFetchHelper.initiateFallbackBlockFetchForMergedBlock(
+blockId, address)
+  // Set result to null to trigger another iteration of the while loop 
to get either
+  // a SuccessFetchResult or a FailureFetchResult.
+  result = null
+
+case MergedBlocksMetaFetchResult(shuffleId, reduceId, blockSize, 
numChunks, bitmaps,
+address, _) =>
+  // The original meta request is processed so we decrease 
numBlocksToFetch by 1. We will
+  // collect new chunks request and the count of this is added to 
numBlocksToFetch in
+  // collectFetchReqsFromMergedBlocks.
+  numBlocksToFetch -= 1
+  val blocksToRequest = 
pushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(
+shuffleId, reduceId, blockSize, numChunks, bitmaps)
+  val additionalRemoteReqs = new ArrayBuffer[FetchRequest]
+  collectFetchRequests(address, blocksToRequest.toSeq, 
additionalRemoteReqs)
+  fetchRequests ++= additionalRemoteReqs
+  // Set result to null to force another iteration.
+  result = null

Review comment:
   Oh, I see. I misread it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES

2021-06-09 Thread GitBox



SparkQA commented on pull request #32852:
URL: https://github.com/apache/spark/pull/32852#issuecomment-858310485


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44138/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on pull request #32815: [SPARK-35675][SQL] EnsureRequirements remove shuffle should respect PartitioningCollection

2021-06-09 Thread GitBox



ulysses-you commented on pull request #32815:
URL: https://github.com/apache/spark/pull/32815#issuecomment-858310082


   thanks for merging !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32470: [WIP] Simplify ResolveAggregateFunctions

2021-06-09 Thread GitBox



SparkQA commented on pull request #32470:
URL: https://github.com/apache/spark/pull/32470#issuecomment-858309113


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44137/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps

2021-06-09 Thread GitBox



SparkQA commented on pull request #32821:
URL: https://github.com/apache/spark/pull/32821#issuecomment-858308504


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44139/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] otterc commented on a change in pull request #32140: [WIP][SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data

2021-06-09 Thread GitBox



otterc commented on a change in pull request #32140:
URL: https://github.com/apache/spark/pull/32140#discussion_r648845832



##
File path: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
##
@@ -1124,4 +1392,298 @@ object ShuffleBlockFetcherIterator {
*/
   private[storage]
   case class DeferFetchRequestResult(fetchRequest: FetchRequest) extends 
FetchResult
+
+  /**
+   * Result of a fetch from a remote merged block unsuccessfully.
+   * Instead of treating this as a FailureFetchResult, we ignore this failure
+   * and fallback to fetch the original unmerged blocks.
+   * @param blockId block id
+   * @param address BlockManager that the merged block was attempted to be 
fetched from
+   * @param size size of the block, used to update bytesInFlight.
+   * @param isNetworkReqDone Is this the last network request for this host in 
this fetch
+   * request. Used to update reqsInFlight.
+   */
+  private[storage] case class IgnoreFetchResult(blockId: BlockId,
+  address: BlockManagerId,
+  size: Long,
+  isNetworkReqDone: Boolean) extends FetchResult
+
+  /**
+   * Result of a successful fetch of meta information for a merged block.
+   *
+   * @param shuffleIdshuffle id.
+   * @param reduceId reduce id.
+   * @param blockSizesize of each merged block.
+   * @param numChunksnumber of chunks in the merged block.
+   * @param bitmaps  bitmaps for every chunk.
+   * @param address  BlockManager that the merged status was fetched 
from.
+   */
+  private[storage] case class MergedBlocksMetaFetchResult(
+  shuffleId: Int,
+  reduceId: Int,
+  blockSize: Long,
+  numChunks: Int,
+  bitmaps: Array[RoaringBitmap],
+  address: BlockManagerId,
+  blockId: BlockId = DUMMY_SHUFFLE_BLOCK_ID) extends FetchResult
+
+  /**
+   * Result of a failure while fetching the meta information for a merged 
block.
+   *
+   * @param shuffleId shuffle id.
+   * @param reduceId  reduce id.
+   * @param address   BlockManager that the merged status was fetched from.
+   */
+  private[storage] case class MergedBlocksMetaFailedFetchResult(
+  shuffleId: Int,
+  reduceId: Int,
+  address: BlockManagerId,
+  blockId: BlockId = DUMMY_SHUFFLE_BLOCK_ID) extends FetchResult
+}
+
+/**
+ * Helper class that encapsulates all the push-based functionality to fetch 
merged block meta
+ * and merged shuffle block chunks.
+ */
+private class PushBasedFetchHelper(

Review comment:
   The problem is that `PushBasedFetchHelper` also needs access to the 
iterator instance.  It needs to work with the iterator to be able to:
   1. add results to the iterator's `result` queue. 
   2. updates number of blocks to fetch.
   3. fetch fallback blocks when there is a fallback and this in turn removes 
some pending blocks from `fetchRequests`.
   
   It also needs access to the `shuffleClient`, `blockManager`, and 
`mapOutputTracker`.
   This is why it is a helper class similar to  the existing 
`BufferReleasingInputStream` and `ShuffleFetchCompletionListener`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk closed pull request #32839: [SPARK-35679][SQL] instantToMicros overflow

2021-06-09 Thread GitBox



MaxGekk closed pull request #32839:
URL: https://github.com/apache/spark/pull/32839


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic opened a new pull request #32853: [SPARK-35683][PYTHON] Fix Index.difference to avoid collect 'other' to driver side

2021-06-09 Thread GitBox



itholic opened a new pull request #32853:
URL: https://github.com/apache/spark/pull/32853


   ### What changes were proposed in this pull request?
   
   This PR fix the wrong behavior of `Index.difference` in pandas APIs on 
Spark, based on the comment 
https://github.com/databricks/koalas/pull/1325#discussion_r647889901 and 
https://github.com/databricks/koalas/pull/1325#discussion_r647890007
   - it couldn't handle the case properly when `self` is `Index` or 
`MultiIndex` and `other` is `MultiIndex` or `Index`.
   ```python
   >>> midx1 = ps.MultiIndex.from_tuples([('a', 'x', 1), ('b', 'z', 2), ('k', 
'z', 3)])
   >>> idx1 = ps.Index([1, 2, 3])
   >>> midx1 = ps.MultiIndex.from_tuples([('a', 'x', 1), ('b', 'z', 2), ('k', 
'z', 3)])
   >>> midx1.difference(idx1)
   pyspark.pandas.exceptions.PandasNotImplementedError: The method 
`pd.Index.__iter__()` is not implemented. If you want to collect your data as 
an NumPy array, use 'to_numpy()' instead.
   ```
   - it's collecting the all data into the driver side when the other is 
list-like objects, especially when the `other` is distributed object such as 
Series which is very dangerous.
   
   And added the related test cases.
   
   ### Why are the changes needed?
   
   To correct the incompatible behavior with pandas, and to prevent the case 
which potentially cause the OOM easily.
   
   ```python
   >>> midx1 = ps.MultiIndex.from_tuples([('a', 'x', 1), ('b', 'z', 2), ('k', 
'z', 3)])
   >>> idx1 = ps.Index([1, 2, 3])
   >>> midx1 = ps.MultiIndex.from_tuples([('a', 'x', 1), ('b', 'z', 2), ('k', 
'z', 3)])
   >>> midx1.difference(idx1)
   MultiIndex([('a', 'x', 1),
   ('b', 'z', 2),
   ('k', 'z', 3)],
  )
   ```
   
   And now it only using the for loop when the `other` is only the case `list`, 
`set` or `dict`.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the previous bug is fixed as described in the above code examples.
   
   
   ### How was this patch tested?
   
   Manually tested with linter and unittest in local, and it might be passed on 
CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on pull request #32839: [SPARK-35679][SQL] instantToMicros overflow

2021-06-09 Thread GitBox



MaxGekk commented on pull request #32839:
URL: https://github.com/apache/spark/pull/32839#issuecomment-858300793


   @dgd-contributor Does the issue exist in other versions: 3.0, 3.1?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-06-09 Thread GitBox



SparkQA commented on pull request #32816:
URL: https://github.com/apache/spark/pull/32816#issuecomment-858299743


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44135/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32470: [WIP] Simplify ResolveAggregateFunctions

2021-06-09 Thread GitBox



SparkQA commented on pull request #32470:
URL: https://github.com/apache/spark/pull/32470#issuecomment-858299573


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44136/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #32841: [SPARK-35673][SQL] Fix user-defined hint and unrecognized hint in subquery.

2021-06-09 Thread GitBox



cloud-fan commented on pull request #32841:
URL: https://github.com/apache/spark/pull/32841#issuecomment-858298057


   ok to test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES

2021-06-09 Thread GitBox



SparkQA commented on pull request #32852:
URL: https://github.com/apache/spark/pull/32852#issuecomment-858297795


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44134/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] otterc commented on a change in pull request #32140: [WIP][SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data

2021-06-09 Thread GitBox



otterc commented on a change in pull request #32140:
URL: https://github.com/apache/spark/pull/32140#discussion_r648847338



##
File path: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
##
@@ -767,6 +908,43 @@ final class ShuffleBlockFetcherIterator(
 deferredFetchRequests.getOrElseUpdate(address, new 
Queue[FetchRequest]())
   defReqQueue.enqueue(request)
   result = null
+
+case IgnoreFetchResult(blockId, address, size, isNetworkReqDone) =>
+  if (pushBasedFetchHelper.isNotExecutorOrMergedLocal(address)) {
+numBlocksInFlightPerAddress(address) = 
numBlocksInFlightPerAddress(address) - 1
+bytesInFlight -= size
+  }
+  if (isNetworkReqDone) {
+reqsInFlight -= 1
+logDebug("Number of requests in flight " + reqsInFlight)
+  }
+  numBlocksProcessed += 
pushBasedFetchHelper.initiateFallbackBlockFetchForMergedBlock(
+blockId, address)
+  // Set result to null to trigger another iteration of the while loop 
to get either
+  // a SuccessFetchResult or a FailureFetchResult.
+  result = null
+
+case MergedBlocksMetaFetchResult(shuffleId, reduceId, blockSize, 
numChunks, bitmaps,
+address, _) =>
+  // The original meta request is processed so we decrease 
numBlocksToFetch by 1. We will
+  // collect new chunks request and the count of this is added to 
numBlocksToFetch in
+  // collectFetchReqsFromMergedBlocks.
+  numBlocksToFetch -= 1
+  val blocksToRequest = 
pushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(
+shuffleId, reduceId, blockSize, numChunks, bitmaps)
+  val additionalRemoteReqs = new ArrayBuffer[FetchRequest]
+  collectFetchRequests(address, blocksToRequest.toSeq, 
additionalRemoteReqs)
+  fetchRequests ++= additionalRemoteReqs
+  // Set result to null to force another iteration.
+  result = null

Review comment:
   Actually, this is the existing code which I haven't modified. The while 
loop inside iterator.next() is as below, so `fetchUpToMaxBytes` is always 
called after a response is matched and processed.
   ```
   while (result == null) {
 val startFetchWait = System.nanoTime()
 result = results.take()
 val fetchWaitTime = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - 
startFetchWait)
 shuffleMetrics.incFetchWaitTime(fetchWaitTime)
   
 result match {...}

 // Send fetch requests up to maxBytesInFlight
 fetchUpToMaxBytes()
   }
   ```





-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #32841: [SPARK-35673][SQL] Fix user-defined hint and unrecognized hint in subquery.

2021-06-09 Thread GitBox



cloud-fan commented on a change in pull request #32841:
URL: https://github.com/apache/spark/pull/32841#discussion_r648847167



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/hints.scala
##
@@ -30,7 +30,8 @@ import 
org.apache.spark.sql.catalyst.trees.TreePattern.{TreePattern, UNRESOLVED_
 case class UnresolvedHint(name: String, parameters: Seq[Any], child: 
LogicalPlan)
   extends UnaryNode {
 
-  override lazy val resolved: Boolean = false
+  override lazy val resolved: Boolean = child.resolved

Review comment:
   Yea, it's just for sanity check, to make sure `UnresolvedHint` shouldn't 
exist after analysis.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32815: [SPARK-35675][SQL] EnsureRequirements remove shuffle should respect PartitioningCollection

2021-06-09 Thread GitBox



AmplabJenkins commented on pull request #32815:
URL: https://github.com/apache/spark/pull/32815#issuecomment-858296344


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44132/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32815: [SPARK-35675][SQL] EnsureRequirements remove shuffle should respect PartitioningCollection

2021-06-09 Thread GitBox



SparkQA commented on pull request #32815:
URL: https://github.com/apache/spark/pull/32815#issuecomment-858296323


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44132/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

2021-06-09 Thread GitBox



cloud-fan commented on a change in pull request #32776:
URL: https://github.com/apache/spark/pull/32776#discussion_r648846235



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
##
@@ -76,19 +76,76 @@ case class CustomShuffleReaderExec private(
 val desc = if (isLocalReader) {
   "local"
 } else if (hasCoalescedPartition && hasSkewedPartition) {
-  "coalesced and skewed"
+  s"$coalescedDetail and $skewedDetail"
 } else if (hasCoalescedPartition) {
-  "coalesced"
+  coalescedDetail
 } else if (hasSkewedPartition) {
-  "skewed"
+  skewedDetail
 } else {
   ""
 }
 Iterator(desc)
   }
+  private def isCoalesced(spec: ShufflePartitionSpec) = coalesceRange(spec) > 1
+  /**
+   * How many partitions were coalesced; 0 if not [[CoalescedPartitionSpec]]
+   */
+  private def coalesceRange(spec: ShufflePartitionSpec) = spec match {
+case s: CoalescedPartitionSpec => s.endReducerIndex - s.startReducerIndex
+case _ => 0
+  }
+
+  /* This is left as documentation
+   * Is it worth reporting this?  For example, if we have
+   * MapOutputStatistics 0,0,0,72,0
+   * MapOutputStatistics 0,0,0,138,138
+   * with target partition size 10, we'll have
+   * CoalescedPartitionSpec(3,4) & CoalescedPartitionSpec(4,5)
+   * So pre-shuffle partitions 0,1,2 are dropped
+   * Another example, (target size 10)
+   * MapOutputStatistics 0,3,0,2,7
+   * MapOutputStatistics 0,2,0,2,7
+   * Results in CoalescedPartitionSpec(1,4) & CoalescedPartitionSpec(4,5)
+   * So pre-shuffle partition 2 is included
+   * We could figure out dropped partitions but doesn't seem that useful.

Review comment:
   I don't think it's useful to report this metrics.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

2021-06-09 Thread GitBox



cloud-fan commented on a change in pull request #32776:
URL: https://github.com/apache/spark/pull/32776#discussion_r648845848



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
##
@@ -76,19 +76,76 @@ case class CustomShuffleReaderExec private(
 val desc = if (isLocalReader) {
   "local"
 } else if (hasCoalescedPartition && hasSkewedPartition) {
-  "coalesced and skewed"
+  s"$coalescedDetail and $skewedDetail"

Review comment:
   AFAIK plan node string never contains metrics before this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] otterc commented on a change in pull request #32140: [WIP][SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data

2021-06-09 Thread GitBox



otterc commented on a change in pull request #32140:
URL: https://github.com/apache/spark/pull/32140#discussion_r648845832



##
File path: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
##
@@ -1124,4 +1392,298 @@ object ShuffleBlockFetcherIterator {
*/
   private[storage]
   case class DeferFetchRequestResult(fetchRequest: FetchRequest) extends 
FetchResult
+
+  /**
+   * Result of a fetch from a remote merged block unsuccessfully.
+   * Instead of treating this as a FailureFetchResult, we ignore this failure
+   * and fallback to fetch the original unmerged blocks.
+   * @param blockId block id
+   * @param address BlockManager that the merged block was attempted to be 
fetched from
+   * @param size size of the block, used to update bytesInFlight.
+   * @param isNetworkReqDone Is this the last network request for this host in 
this fetch
+   * request. Used to update reqsInFlight.
+   */
+  private[storage] case class IgnoreFetchResult(blockId: BlockId,
+  address: BlockManagerId,
+  size: Long,
+  isNetworkReqDone: Boolean) extends FetchResult
+
+  /**
+   * Result of a successful fetch of meta information for a merged block.
+   *
+   * @param shuffleIdshuffle id.
+   * @param reduceId reduce id.
+   * @param blockSizesize of each merged block.
+   * @param numChunksnumber of chunks in the merged block.
+   * @param bitmaps  bitmaps for every chunk.
+   * @param address  BlockManager that the merged status was fetched 
from.
+   */
+  private[storage] case class MergedBlocksMetaFetchResult(
+  shuffleId: Int,
+  reduceId: Int,
+  blockSize: Long,
+  numChunks: Int,
+  bitmaps: Array[RoaringBitmap],
+  address: BlockManagerId,
+  blockId: BlockId = DUMMY_SHUFFLE_BLOCK_ID) extends FetchResult
+
+  /**
+   * Result of a failure while fetching the meta information for a merged 
block.
+   *
+   * @param shuffleId shuffle id.
+   * @param reduceId  reduce id.
+   * @param address   BlockManager that the merged status was fetched from.
+   */
+  private[storage] case class MergedBlocksMetaFailedFetchResult(
+  shuffleId: Int,
+  reduceId: Int,
+  address: BlockManagerId,
+  blockId: BlockId = DUMMY_SHUFFLE_BLOCK_ID) extends FetchResult
+}
+
+/**
+ * Helper class that encapsulates all the push-based functionality to fetch 
merged block meta
+ * and merged shuffle block chunks.
+ */
+private class PushBasedFetchHelper(

Review comment:
   The problem is that `PushBasedFetchHelper` also needs access to the 
iterator instance. It needs to work with the iterator to be able to:
   1. add results to the iterator's `result` queue. 
   2. updates number of blocks to fetch.
   3. fetch fallback blocks when there is a fallback and this in turn removes 
some pending blocks from `fetchRequests`.
   This is why it is a helper class similar to  the existing 
`BufferReleasingInputStream` and `ShuffleFetchCompletionListener`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

2021-06-09 Thread GitBox



cloud-fan commented on a change in pull request #32776:
URL: https://github.com/apache/spark/pull/32776#discussion_r648845537



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
##
@@ -171,6 +238,19 @@ case class CustomShuffleReaderExec private(
 } else {
   Map.empty
 }
+  } ++ {
+if (isLocalReader) {
+  Map.empty
+} else {
+  if (hasCoalescedPartition) {
+Map("numCoalescedPartitions" ->
+  SQLMetrics.createMetric(sparkContext, "number of coalesced 
partitions"),
+  "numPartitionsToCoalesce" ->

Review comment:
   There is always a shuffle node below `CustomShuffleReader`. I think it 
makes more sense to let the shuffle node to report the metrics of the number of 
partitions/reducers.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

2021-06-09 Thread GitBox



cloud-fan commented on a change in pull request #32776:
URL: https://github.com/apache/spark/pull/32776#discussion_r648844909



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
##
@@ -76,19 +76,76 @@ case class CustomShuffleReaderExec private(
 val desc = if (isLocalReader) {
   "local"
 } else if (hasCoalescedPartition && hasSkewedPartition) {
-  "coalesced and skewed"
+  s"$coalescedDetail and $skewedDetail"

Review comment:
   It makes sense to add more metrics but it doesn't make sense to include 
metrics in the plan node string.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

2021-06-09 Thread GitBox



SparkQA commented on pull request #32776:
URL: https://github.com/apache/spark/pull/32776#issuecomment-858294837


   **[Test build #139613 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139613/testReport)**
 for PR 32776 at commit 
[`aee1392`](https://github.com/apache/spark/commit/aee1392720815f332e8fb993b4672bb03fe4ccb1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

2021-06-09 Thread GitBox



AmplabJenkins removed a comment on pull request #32776:
URL: https://github.com/apache/spark/pull/32776#issuecomment-854245118


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

2021-06-09 Thread GitBox



cloud-fan commented on pull request #32776:
URL: https://github.com/apache/spark/pull/32776#issuecomment-858294433


   ok to test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #32842: [MINOR][SQL] No need to normolize name for built-in functions

2021-06-09 Thread GitBox



cloud-fan closed pull request #32842:
URL: https://github.com/apache/spark/pull/32842


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #32842: [MINOR][SQL] No need to normolize name for built-in functions

2021-06-09 Thread GitBox



cloud-fan commented on pull request #32842:
URL: https://github.com/apache/spark/pull/32842#issuecomment-858292148


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps

2021-06-09 Thread GitBox



AmplabJenkins removed a comment on pull request #32821:
URL: https://github.com/apache/spark/pull/32821#issuecomment-858290874


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139611/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps

2021-06-09 Thread GitBox



SparkQA removed a comment on pull request #32821:
URL: https://github.com/apache/spark/pull/32821#issuecomment-858290656


   **[Test build #139611 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139611/testReport)**
 for PR 32821 at commit 
[`b3168ac`](https://github.com/apache/spark/commit/b3168ac30f2d99653ef29fe80e968836ca956fe0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps

2021-06-09 Thread GitBox



AmplabJenkins commented on pull request #32821:
URL: https://github.com/apache/spark/pull/32821#issuecomment-858290874


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139611/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps

2021-06-09 Thread GitBox



SparkQA commented on pull request #32821:
URL: https://github.com/apache/spark/pull/32821#issuecomment-858290866


   **[Test build #139611 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139611/testReport)**
 for PR 32821 at commit 
[`b3168ac`](https://github.com/apache/spark/commit/b3168ac30f2d99653ef29fe80e968836ca956fe0).
* This patch **fails RAT tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32470: [WIP] Simplify ResolveAggregateFunctions

2021-06-09 Thread GitBox



SparkQA commented on pull request #32470:
URL: https://github.com/apache/spark/pull/32470#issuecomment-858290812


   **[Test build #139612 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139612/testReport)**
 for PR 32470 at commit 
[`d08d8e4`](https://github.com/apache/spark/commit/d08d8e4cb6acfaff34e3d81cc8b41c65aa34f4f6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps

2021-06-09 Thread GitBox



SparkQA commented on pull request #32821:
URL: https://github.com/apache/spark/pull/32821#issuecomment-858290656


   **[Test build #139611 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139611/testReport)**
 for PR 32821 at commit 
[`b3168ac`](https://github.com/apache/spark/commit/b3168ac30f2d99653ef29fe80e968836ca956fe0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES

2021-06-09 Thread GitBox



SparkQA commented on pull request #32852:
URL: https://github.com/apache/spark/pull/32852#issuecomment-858290567


   **[Test build #139610 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139610/testReport)**
 for PR 32852 at commit 
[`716a50c`](https://github.com/apache/spark/commit/716a50cf9f48ac9fab05b8860c9c5e714c729610).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31024: [SPARK-33979][SQL] Reorder predicate

2021-06-09 Thread GitBox



AmplabJenkins removed a comment on pull request #31024:
URL: https://github.com/apache/spark/pull/31024#issuecomment-858289716


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139600/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans

2021-06-09 Thread GitBox



AmplabJenkins removed a comment on pull request #32787:
URL: https://github.com/apache/spark/pull/32787#issuecomment-858289714


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139601/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32842: [MINOR][SQL] No need to normolize name for built-in functions

2021-06-09 Thread GitBox



AmplabJenkins removed a comment on pull request #32842:
URL: https://github.com/apache/spark/pull/32842#issuecomment-858289713


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44133/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32842: [MINOR][SQL] No need to normolize name for built-in functions

2021-06-09 Thread GitBox



AmplabJenkins commented on pull request #32842:
URL: https://github.com/apache/spark/pull/32842#issuecomment-858289713


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44133/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #31024: [SPARK-33979][SQL] Reorder predicate

2021-06-09 Thread GitBox



AmplabJenkins commented on pull request #31024:
URL: https://github.com/apache/spark/pull/31024#issuecomment-858289716


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139600/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans

2021-06-09 Thread GitBox



AmplabJenkins commented on pull request #32787:
URL: https://github.com/apache/spark/pull/32787#issuecomment-858289714


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139601/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ekoifman commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

2021-06-09 Thread GitBox



ekoifman commented on a change in pull request #32776:
URL: https://github.com/apache/spark/pull/32776#discussion_r648838734



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
##
@@ -76,19 +76,76 @@ case class CustomShuffleReaderExec private(
 val desc = if (isLocalReader) {
   "local"
 } else if (hasCoalescedPartition && hasSkewedPartition) {
-  "coalesced and skewed"
+  s"$coalescedDetail and $skewedDetail"
 } else if (hasCoalescedPartition) {
-  "coalesced"
+  coalescedDetail
 } else if (hasSkewedPartition) {
-  "skewed"
+  skewedDetail
 } else {
   ""
 }
 Iterator(desc)
   }
+  private def isCoalesced(spec: ShufflePartitionSpec) = coalesceRange(spec) > 1
+  /**
+   * How many partitions were coalesced; 0 if not [[CoalescedPartitionSpec]]
+   */
+  private def coalesceRange(spec: ShufflePartitionSpec) = spec match {
+case s: CoalescedPartitionSpec => s.endReducerIndex - s.startReducerIndex
+case _ => 0
+  }
+
+  /* This is left as documentation
+   * Is it worth reporting this?  For example, if we have
+   * MapOutputStatistics 0,0,0,72,0
+   * MapOutputStatistics 0,0,0,138,138
+   * with target partition size 10, we'll have
+   * CoalescedPartitionSpec(3,4) & CoalescedPartitionSpec(4,5)
+   * So pre-shuffle partitions 0,1,2 are dropped
+   * Another example, (target size 10)
+   * MapOutputStatistics 0,3,0,2,7
+   * MapOutputStatistics 0,2,0,2,7
+   * Results in CoalescedPartitionSpec(1,4) & CoalescedPartitionSpec(4,5)
+   * So pre-shuffle partition 2 is included
+   * We could figure out dropped partitions but doesn't seem that useful.
+   */
+  private def numDroppedPartitions = 0
+
+  private def numCoalescedPartitions = partitionSpecs.count(isCoalesced)
+
+  /**
+   * partitions that will be combined with others (as opposed to taken as is, 
spilt, dropped)
+   */
+  private def numPartitionsToCoalesce = partitionSpecs.filter(isCoalesced)
+.foldLeft(0)((c, s) => c + coalesceRange(s))
+
+  /**
+   * total splits of all skewed partitions
+   */
+  private def skewedPartitionSplits = partitionSpecs.collect {
+case p: PartialReducerPartitionSpec => p
+  }
 
-  def hasCoalescedPartition: Boolean =
-partitionSpecs.exists(_.isInstanceOf[CoalescedPartitionSpec])

Review comment:
   I don't understand this comment.  This change is a critical part of this 
PR




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32470: [WIP] Simplify ResolveAggregateFunctions

2021-06-09 Thread GitBox



SparkQA commented on pull request #32470:
URL: https://github.com/apache/spark/pull/32470#issuecomment-858287872


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44136/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-06-09 Thread GitBox



SparkQA commented on pull request #32816:
URL: https://github.com/apache/spark/pull/32816#issuecomment-858287594


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44135/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES

2021-06-09 Thread GitBox



beliefer commented on pull request #32852:
URL: https://github.com/apache/spark/pull/32852#issuecomment-858287344


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES

2021-06-09 Thread GitBox



SparkQA commented on pull request #32852:
URL: https://github.com/apache/spark/pull/32852#issuecomment-858286065


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44134/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

2021-06-09 Thread GitBox



ulysses-you commented on a change in pull request #32776:
URL: https://github.com/apache/spark/pull/32776#discussion_r648834767



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
##
@@ -76,19 +76,76 @@ case class CustomShuffleReaderExec private(
 val desc = if (isLocalReader) {
   "local"
 } else if (hasCoalescedPartition && hasSkewedPartition) {
-  "coalesced and skewed"
+  s"$coalescedDetail and $skewedDetail"
 } else if (hasCoalescedPartition) {
-  "coalesced"
+  coalescedDetail
 } else if (hasSkewedPartition) {
-  "skewed"
+  skewedDetail
 } else {
   ""
 }
 Iterator(desc)
   }
+  private def isCoalesced(spec: ShufflePartitionSpec) = coalesceRange(spec) > 1
+  /**
+   * How many partitions were coalesced; 0 if not [[CoalescedPartitionSpec]]
+   */
+  private def coalesceRange(spec: ShufflePartitionSpec) = spec match {
+case s: CoalescedPartitionSpec => s.endReducerIndex - s.startReducerIndex
+case _ => 0
+  }
+
+  /* This is left as documentation
+   * Is it worth reporting this?  For example, if we have
+   * MapOutputStatistics 0,0,0,72,0
+   * MapOutputStatistics 0,0,0,138,138
+   * with target partition size 10, we'll have
+   * CoalescedPartitionSpec(3,4) & CoalescedPartitionSpec(4,5)
+   * So pre-shuffle partitions 0,1,2 are dropped
+   * Another example, (target size 10)
+   * MapOutputStatistics 0,3,0,2,7
+   * MapOutputStatistics 0,2,0,2,7
+   * Results in CoalescedPartitionSpec(1,4) & CoalescedPartitionSpec(4,5)
+   * So pre-shuffle partition 2 is included
+   * We could figure out dropped partitions but doesn't seem that useful.
+   */
+  private def numDroppedPartitions = 0
+
+  private def numCoalescedPartitions = partitionSpecs.count(isCoalesced)
+
+  /**
+   * partitions that will be combined with others (as opposed to taken as is, 
spilt, dropped)
+   */
+  private def numPartitionsToCoalesce = partitionSpecs.filter(isCoalesced)
+.foldLeft(0)((c, s) => c + coalesceRange(s))
+
+  /**
+   * total splits of all skewed partitions
+   */
+  private def skewedPartitionSplits = partitionSpecs.collect {
+case p: PartialReducerPartitionSpec => p
+  }
 
-  def hasCoalescedPartition: Boolean =
-partitionSpecs.exists(_.isInstanceOf[CoalescedPartitionSpec])

Review comment:
   this change worth a new PR, can you create a new one for it ? thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32842: [MINOR][SQL] No need to normolize name for built-in functions

2021-06-09 Thread GitBox



SparkQA commented on pull request #32842:
URL: https://github.com/apache/spark/pull/32842#issuecomment-858280175


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44133/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] venkata91 commented on pull request #32754: [SPARK-35613][CORE][SQL] Cache commonly occurring strings in SQLMetrics, JSONProtocol and AccumulatorV2 classes

2021-06-09 Thread GitBox



venkata91 commented on pull request #32754:
URL: https://github.com/apache/spark/pull/32754#issuecomment-858275251


   Gentle reminder @HyukjinKwon @cloud-fan @mridulm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans

2021-06-09 Thread GitBox



SparkQA removed a comment on pull request #32787:
URL: https://github.com/apache/spark/pull/32787#issuecomment-858168705


   **[Test build #139601 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139601/testReport)**
 for PR 32787 at commit 
[`1704a9a`](https://github.com/apache/spark/commit/1704a9a8c54fab6b0b450c4fe32be444dc81df20).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans

2021-06-09 Thread GitBox



SparkQA commented on pull request #32787:
URL: https://github.com/apache/spark/pull/32787#issuecomment-858258406


   **[Test build #139601 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139601/testReport)**
 for PR 32787 at commit 
[`1704a9a`](https://github.com/apache/spark/commit/1704a9a8c54fab6b0b450c4fe32be444dc81df20).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #31024: [SPARK-33979][SQL] Reorder predicate

2021-06-09 Thread GitBox



SparkQA removed a comment on pull request #31024:
URL: https://github.com/apache/spark/pull/31024#issuecomment-858167571


   **[Test build #139600 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139600/testReport)**
 for PR 31024 at commit 
[`dc4419a`](https://github.com/apache/spark/commit/dc4419a233ea154aaba6c1842f0417bcd29aed61).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31024: [SPARK-33979][SQL] Reorder predicate

2021-06-09 Thread GitBox



SparkQA commented on pull request #31024:
URL: https://github.com/apache/spark/pull/31024#issuecomment-858257375


   **[Test build #139600 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139600/testReport)**
 for PR 31024 at commit 
[`dc4419a`](https://github.com/apache/spark/commit/dc4419a233ea154aaba6c1842f0417bcd29aed61).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES

2021-06-09 Thread GitBox



AmplabJenkins removed a comment on pull request #32852:
URL: https://github.com/apache/spark/pull/32852#issuecomment-858254178


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139606/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES

2021-06-09 Thread GitBox



SparkQA removed a comment on pull request #32852:
URL: https://github.com/apache/spark/pull/32852#issuecomment-858252068


   **[Test build #139606 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139606/testReport)**
 for PR 32852 at commit 
[`716a50c`](https://github.com/apache/spark/commit/716a50cf9f48ac9fab05b8860c9c5e714c729610).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES

2021-06-09 Thread GitBox



AmplabJenkins commented on pull request #32852:
URL: https://github.com/apache/spark/pull/32852#issuecomment-858254178


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139606/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES

2021-06-09 Thread GitBox



SparkQA commented on pull request #32852:
URL: https://github.com/apache/spark/pull/32852#issuecomment-858254163


   **[Test build #139606 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139606/testReport)**
 for PR 32852 at commit 
[`716a50c`](https://github.com/apache/spark/commit/716a50cf9f48ac9fab05b8860c9c5e714c729610).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32470: [WIP] Simplify ResolveAggregateFunctions

2021-06-09 Thread GitBox



AmplabJenkins removed a comment on pull request #32470:
URL: https://github.com/apache/spark/pull/32470#issuecomment-858252760


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139609/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32470: [WIP] Simplify ResolveAggregateFunctions

2021-06-09 Thread GitBox



SparkQA removed a comment on pull request #32470:
URL: https://github.com/apache/spark/pull/32470#issuecomment-858252267


   **[Test build #139609 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139609/testReport)**
 for PR 32470 at commit 
[`e33c0a8`](https://github.com/apache/spark/commit/e33c0a87388d587f18f1e7e21b9b2170dab7b695).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32470: [WIP] Simplify ResolveAggregateFunctions

2021-06-09 Thread GitBox



AmplabJenkins commented on pull request #32470:
URL: https://github.com/apache/spark/pull/32470#issuecomment-858252760


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139609/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32470: [WIP] Simplify ResolveAggregateFunctions

2021-06-09 Thread GitBox



SparkQA commented on pull request #32470:
URL: https://github.com/apache/spark/pull/32470#issuecomment-858252747


   **[Test build #139609 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139609/testReport)**
 for PR 32470 at commit 
[`e33c0a8`](https://github.com/apache/spark/commit/e33c0a87388d587f18f1e7e21b9b2170dab7b695).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1098 matches

Mail list logo