date:20210719

[GitHub] [spark] cloud-fan commented on a change in pull request #33200: [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier

2021-07-19 Thread GitBox



cloud-fan commented on a change in pull request #33200:
URL: https://github.com/apache/spark/pull/33200#discussion_r672827401



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -3574,15 +3568,64 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 
   /**
* Rule to mostly resolve, normalize and rewrite column names based on case 
sensitivity
-   * for alter table commands.
+   * for alter table column commands.
*/
-  object ResolveAlterTableCommands extends Rule[LogicalPlan] {
+  object ResolveAlterTableColumnCommands extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
-  case a: AlterTableCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
+  case a: AlterTableColumnCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
 val table = a.table.asInstanceOf[ResolvedTable]
 a.transformExpressions {
-  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u)
+  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, 
u.origin)
+}
+
+  case a @ AlterTableAddColumns(r: ResolvedTable, cols) if 
hasUnresolvedColumns(cols) =>

Review comment:
   This looks fine. We can remove the `if hasUnresolvedColumns(cols)` which 
is not very useful here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33430: [SPARK-36046][SQL][FOLLOWUP] Implement prettyName for MakeTimestampNTZ and MakeTimestampLTZ

2021-07-19 Thread GitBox



SparkQA commented on pull request #33430:
URL: https://github.com/apache/spark/pull/33430#issuecomment-883090499


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45810/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33431: [SPARK-36221][SQL] Make sure CustomShuffleReaderExec has at least one partition

2021-07-19 Thread GitBox



SparkQA commented on pull request #33431:
URL: https://github.com/apache/spark/pull/33431#issuecomment-883090213


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45809/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec

2021-07-19 Thread GitBox



SparkQA removed a comment on pull request #33424:
URL: https://github.com/apache/spark/pull/33424#issuecomment-882968356


   **[Test build #141288 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141288/testReport)**
 for PR 33424 at commit 
[`afa5539`](https://github.com/apache/spark/commit/afa55393f8d0c6884ed1d47ac3ced1112a87e7b6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package

2021-07-19 Thread GitBox



AmplabJenkins removed a comment on pull request #33350:
URL: https://github.com/apache/spark/pull/33350#issuecomment-883087045


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45807/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox



AmplabJenkins removed a comment on pull request #33239:
URL: https://github.com/apache/spark/pull/33239#issuecomment-883087051


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45808/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox



AmplabJenkins removed a comment on pull request #33428:
URL: https://github.com/apache/spark/pull/33428#issuecomment-883011701


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec

2021-07-19 Thread GitBox



AmplabJenkins removed a comment on pull request #33424:
URL: https://github.com/apache/spark/pull/33424#issuecomment-883087048


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141288/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #33200: [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier

2021-07-19 Thread GitBox



cloud-fan commented on a change in pull request #33200:
URL: https://github.com/apache/spark/pull/33200#discussion_r672821019



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -3574,15 +3568,64 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 
   /**
* Rule to mostly resolve, normalize and rewrite column names based on case 
sensitivity
-   * for alter table commands.
+   * for alter table column commands.
*/
-  object ResolveAlterTableCommands extends Rule[LogicalPlan] {
+  object ResolveAlterTableColumnCommands extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
-  case a: AlterTableCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
+  case a: AlterTableColumnCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
 val table = a.table.asInstanceOf[ResolvedTable]
 a.transformExpressions {
-  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u)
+  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, 
u.origin)
+}
+
+  case a @ AlterTableAddColumns(r: ResolvedTable, cols) if 
hasUnresolvedColumns(cols) =>

Review comment:
   maybe we should do
   ```
   case class QualifiedColType {
 path: Option[FieldName], // None for top-level columns
 colName: String,
 ...
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33410: [WIP][SPARK-36204][INFRA][BUILD] Deduplicate Scala 2.13 daily build

2021-07-19 Thread GitBox



SparkQA commented on pull request #33410:
URL: https://github.com/apache/spark/pull/33410#issuecomment-883088778


   **[Test build #141303 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141303/testReport)**
 for PR 33410 at commit 
[`48bfb39`](https://github.com/apache/spark/commit/48bfb39f0d3a8d15614998297ba56addf3b756b5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33416: [SPARK-36207][PYTHON] Expose databaseExists in pyspark.sql.catalog

2021-07-19 Thread GitBox



SparkQA commented on pull request #33416:
URL: https://github.com/apache/spark/pull/33416#issuecomment-883088693


   **[Test build #141302 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141302/testReport)**
 for PR 33416 at commit 
[`ac88451`](https://github.com/apache/spark/commit/ac88451fa14154ad111c2fe2399c8576b133a03f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox



SparkQA commented on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883088620


   **[Test build #141300 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141300/testReport)**
 for PR 33429 at commit 
[`2840b47`](https://github.com/apache/spark/commit/2840b475c6de6e3bd5bd3cfce7e981a289ab1e39).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox



SparkQA commented on pull request #33428:
URL: https://github.com/apache/spark/pull/33428#issuecomment-883088661


   **[Test build #141301 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141301/testReport)**
 for PR 33428 at commit 
[`f4330a5`](https://github.com/apache/spark/commit/f4330a5abbd87c19191764b59bb5d55bf6472432).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-07-19 Thread GitBox



SparkQA commented on pull request #33432:
URL: https://github.com/apache/spark/pull/33432#issuecomment-883088569


   **[Test build #141299 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141299/testReport)**
 for PR 33432 at commit 
[`dfabd0f`](https://github.com/apache/spark/commit/dfabd0fce7e9079cd66e75be0eb02a1c814c8b0b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox



AmplabJenkins commented on pull request #33239:
URL: https://github.com/apache/spark/pull/33239#issuecomment-883087051


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45808/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package

2021-07-19 Thread GitBox



AmplabJenkins commented on pull request #33350:
URL: https://github.com/apache/spark/pull/33350#issuecomment-883087045


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45807/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec

2021-07-19 Thread GitBox



AmplabJenkins commented on pull request #33424:
URL: https://github.com/apache/spark/pull/33424#issuecomment-883087048


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141288/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox



SparkQA commented on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883084988


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45811/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec

2021-07-19 Thread GitBox



SparkQA commented on pull request #33424:
URL: https://github.com/apache/spark/pull/33424#issuecomment-883082141


   **[Test build #141288 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141288/testReport)**
 for PR 33424 at commit 
[`afa5539`](https://github.com/apache/spark/commit/afa55393f8d0c6884ed1d47ac3ced1112a87e7b6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] otterc commented on a change in pull request #33425: [SPARK-32919][FOLLOW-UP] Filter out driver in the merger locations and fix the return type of RemoveShufflePushMergerLocations

2021-07-19 Thread GitBox



otterc commented on a change in pull request #33425:
URL: https://github.com/apache/spark/pull/33425#discussion_r672821698



##
File path: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
##
@@ -2093,6 +2093,9 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with BeforeAndAfterE
   Seq("hostC", "hostB", "hostD").sorted)
 assert(master.getShufflePushMergerLocations(4, 
Set.empty).map(_.host).sorted ===
   Seq("hostB", "hostA", "hostC", "hostD").sorted)
+master.removeShufflePushMergerLocation("hostA")
+assert(master.getShufflePushMergerLocations(4, 
Set.empty).map(_.host).sorted ===
+  Seq("hostB", "hostC", "hostD").sorted)

Review comment:
   Can we extend this UT to verify the driver host is excluded? It will 
ensure that any future changes will not change this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #33200: [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier

2021-07-19 Thread GitBox



cloud-fan commented on a change in pull request #33200:
URL: https://github.com/apache/spark/pull/33200#discussion_r672821019



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -3574,15 +3568,64 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 
   /**
* Rule to mostly resolve, normalize and rewrite column names based on case 
sensitivity
-   * for alter table commands.
+   * for alter table column commands.
*/
-  object ResolveAlterTableCommands extends Rule[LogicalPlan] {
+  object ResolveAlterTableColumnCommands extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
-  case a: AlterTableCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
+  case a: AlterTableColumnCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
 val table = a.table.asInstanceOf[ResolvedTable]
 a.transformExpressions {
-  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u)
+  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, 
u.origin)
+}
+
+  case a @ AlterTableAddColumns(r: ResolvedTable, cols) if 
hasUnresolvedColumns(cols) =>

Review comment:
   maybe we should do
   ```
   case class QualifiedColType {
 path: Option[FieldName], // None for top-level columns
 colName: String,
 ...
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox



HyukjinKwon commented on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883075964


   let me rebase. seems like it couldn't detect my GitHub actions job.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] c21 commented on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-07-19 Thread GitBox



c21 commented on pull request #33432:
URL: https://github.com/apache/spark/pull/33432#issuecomment-883074400


   cc @cloud-fan could you help take a look when you have time? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] c21 opened a new pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-07-19 Thread GitBox



c21 opened a new pull request #33432:
URL: https://github.com/apache/spark/pull/33432


   
   
   ### What changes were proposed in this pull request?
   
   This is a re-work of https://github.com/apache/spark/pull/30003, here we add 
support for writing Hive bucketed table with Parquet/ORC file format (data 
source v1 write path and Hive hash as the hash function). Support for Hive's 
other file format will be added in follow up PR.
   
   The changes are mostly on:
   
   * `HiveMetastoreCatalog.scala`: When converting hive table relation to data 
source relation, pass bucket info (BucketSpec) and other hive related info as 
options into `HadoopFsRelation` and `LogicalRelation`, which can be later 
accessed by `FileFormatWriter` to customize bucket id and file name.
   
   * `FileFormatWriter.scala`: Use `HiveHash` for `bucketIdExpression` if it's 
writing to Hive bucketed table. In addition, Spark output file name should 
follow Hive/Presto/Trino bucketed file naming convention. Introduce another 
parameter `bucketFileNamePrefix` and it introduces subsequent change in 
`FileFormatDataWriter`.
   
   * `HadoopMapReduceCommitProtocol`: Implement the new file name APIs 
introduced in https://github.com/apache/spark/pull/33012, and change its 
sub-class `PathOutputCommitProtocol`, to make Hive bucketed table writing work 
with all commit protocol (including S3A commit protocol).
   
   ### Why are the changes needed?
   
   To make Spark write other-SQL-engines-compatible bucketed table. Currently 
Spark bucketed table cannot be leveraged by other SQL engines like Hive and 
Presto, because it uses a different hash function (Spark murmur3hash). With 
this PR, the Spark-written-Hive-bucketed-table can be efficiently read by 
Presto and Hive to do bucket filter pruning, join, group-by, etc. This was and 
is blocking several companies (confirmed from Facebook, Lyft, etc) migrate 
bucketing workload from Hive to Spark.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, any Hive bucketed table (with Parquet/ORC format) written by Spark, is 
properly bucketed and can be efficiently processed by Hive and Presto/Trino.
   
   ### How was this patch tested?
   
   * Added unit test in BucketedWriteWithHiveSupportSuite.scala, to verify 
bucket file names and each row in each bucket is written properly.
   * WIP test in production. Will update later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox



SparkQA commented on pull request #33239:
URL: https://github.com/apache/spark/pull/33239#issuecomment-883072761


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45808/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package

2021-07-19 Thread GitBox



SparkQA commented on pull request #33350:
URL: https://github.com/apache/spark/pull/33350#issuecomment-883071887


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45807/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too

2021-07-19 Thread GitBox



dongjoon-hyun commented on pull request #33409:
URL: https://github.com/apache/spark/pull/33409#issuecomment-883069750


   It seems that the master branch's Java 17 job is suffering with the same 
reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too

2021-07-19 Thread GitBox



dongjoon-hyun edited a comment on pull request #33409:
URL: https://github.com/apache/spark/pull/33409#issuecomment-883069206


   The error code is the following. It looks like OOM happening again.
   - 
https://github.com/AngersZh/spark/runs/3110053861?check_suite_focus=true
   ```
./build/mvn: line 178:  1699 Killed  "${MVN_BIN}" "$@"
   2021-07-20T04:37:47.6486105Z ##[error]Process completed with exit code 137.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too

2021-07-19 Thread GitBox



dongjoon-hyun commented on pull request #33409:
URL: https://github.com/apache/spark/pull/33409#issuecomment-883069206


   The error code is the following. It looks like OOM happening again.
   ```
./build/mvn: line 178:  1699 Killed  "${MVN_BIN}" "$@"
   2021-07-20T04:37:47.6486105Z ##[error]Process completed with exit code 137.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too

2021-07-19 Thread GitBox



dongjoon-hyun commented on pull request #33409:
URL: https://github.com/apache/spark/pull/33409#issuecomment-883067722


   Wow, the GitHub Action failures look really weird.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jhu-chang commented on a change in pull request #33263: [SPARK-35027][CORE] Close the inputStream in FileAppender when writin…

2021-07-19 Thread GitBox



jhu-chang commented on a change in pull request #33263:
URL: https://github.com/apache/spark/pull/33263#discussion_r672813847



##
File path: core/src/main/scala/org/apache/spark/util/logging/FileAppender.scala
##
@@ -76,7 +80,13 @@ private[spark] class FileAppender(inputStream: InputStream, 
file: File, bufferSi
   }
 }
   } {
-closeFile()
+try {
+  if (closeStreams) {
+inputStream.close()
+  }

Review comment:
   @Ngone51 @srowen 
   It's for both normal exit and exception.
   Sorry, i don't quite understand the last comment: do you mean handling the 
error from "inputStream.close()"?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox



viirya commented on a change in pull request #33239:
URL: https://github.com/apache/spark/pull/33239#discussion_r672812727



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/CustomMetrics.scala
##
@@ -51,7 +51,7 @@ object CustomMetrics {
   currentMetricsValues: Seq[CustomTaskMetric],
   customMetrics: Map[String, SQLMetric]): Unit = {
 currentMetricsValues.foreach { metric =>
-  customMetrics(metric.name()).set(metric.value())
+  customMetrics.get(metric.name()).map(_.set(metric.value()))

Review comment:
   Ok.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox



viirya commented on a change in pull request #33239:
URL: https://github.com/apache/spark/pull/33239#discussion_r672812642



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriterMetricSuite.scala
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.datasources
+
+import java.util.Collections
+
+import org.scalatest.BeforeAndAfter
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.connector.catalog.{Identifier, 
InMemoryTableCatalog}
+import org.apache.spark.sql.functions.lit
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.StructType
+
+class FileFormatDataWriterMetricSuite

Review comment:
   Yea, I think so.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox



HyukjinKwon commented on pull request #33428:
URL: https://github.com/apache/spark/pull/33428#issuecomment-883060656


   Jenkins, ok to test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox



dongjoon-hyun commented on a change in pull request #33239:
URL: https://github.com/apache/spark/pull/33239#discussion_r672811174



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriterMetricSuite.scala
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.datasources
+
+import java.util.Collections
+
+import org.scalatest.BeforeAndAfter
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.connector.catalog.{Identifier, 
InMemoryTableCatalog}
+import org.apache.spark.sql.functions.lit
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.StructType
+
+class FileFormatDataWriterMetricSuite

Review comment:
   For this one, I guess we need @gengliangwang 's review since he 
requested?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox



AmplabJenkins commented on pull request #33422:
URL: https://github.com/apache/spark/pull/33422#issuecomment-883059530


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141285/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox



SparkQA removed a comment on pull request #33422:
URL: https://github.com/apache/spark/pull/33422#issuecomment-882962473


   **[Test build #141285 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141285/testReport)**
 for PR 33422 at commit 
[`2a21bb3`](https://github.com/apache/spark/commit/2a21bb3017643410a81305d000af5e591b8ba3bb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox



dongjoon-hyun commented on a change in pull request #33239:
URL: https://github.com/apache/spark/pull/33239#discussion_r672810180



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/CustomMetrics.scala
##
@@ -51,7 +51,7 @@ object CustomMetrics {
   currentMetricsValues: Seq[CustomTaskMetric],
   customMetrics: Map[String, SQLMetric]): Unit = {
 currentMetricsValues.foreach { metric =>
-  customMetrics(metric.name()).set(metric.value())
+  customMetrics.get(metric.name()).map(_.set(metric.value()))

Review comment:
   Also, it would be great if you put the explanation at line 48.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox



SparkQA commented on pull request #33422:
URL: https://github.com/apache/spark/pull/33422#issuecomment-883058475


   **[Test build #141285 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141285/testReport)**
 for PR 33422 at commit 
[`2a21bb3`](https://github.com/apache/spark/commit/2a21bb3017643410a81305d000af5e591b8ba3bb).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox



dongjoon-hyun commented on a change in pull request #33239:
URL: https://github.com/apache/spark/pull/33239#discussion_r672809311



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/CustomMetrics.scala
##
@@ -51,7 +51,7 @@ object CustomMetrics {
   currentMetricsValues: Seq[CustomTaskMetric],
   customMetrics: Map[String, SQLMetric]): Unit = {
 currentMetricsValues.foreach { metric =>
-  customMetrics(metric.name()).set(metric.value())
+  customMetrics.get(metric.name()).map(_.set(metric.value()))

Review comment:
   This looks more robust. Could you add a test case for this no-op too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-19 Thread GitBox



mridulm commented on pull request #32401:
URL: https://github.com/apache/spark/pull/32401#issuecomment-883057074


   Thanks for the clarifications ! This sounds good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better

2021-07-19 Thread GitBox



mridulm commented on pull request #33078:
URL: https://github.com/apache/spark/pull/33078#issuecomment-883056728


   +CC @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better

2021-07-19 Thread GitBox



mridulm commented on pull request #33078:
URL: https://github.com/apache/spark/pull/33078#issuecomment-883056445


   Merged to master and branch-3.2
   Thanks for working on this @zhouyejoe !
   Thanks for all the reviews @Ngone51, @otterc, @venkata91 :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] asfgit closed pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better way

2021-07-19 Thread GitBox



asfgit closed pull request #33078:
URL: https://github.com/apache/spark/pull/33078


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox



SparkQA commented on pull request #33239:
URL: https://github.com/apache/spark/pull/33239#issuecomment-883053190


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45808/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tobiasedwards commented on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox



tobiasedwards commented on pull request #33428:
URL: https://github.com/apache/spark/pull/33428#issuecomment-883053001


   There we go, that should be better.
   
   When I botched the rebase the bot added some incorrect labels though, are 
you able to remove them, @HyukjinKwon?
   
   Thanks again for your help!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox



dongjoon-hyun commented on a change in pull request #33429:
URL: https://github.com/apache/spark/pull/33429#discussion_r672805734



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##
@@ -88,23 +88,23 @@ case class CoalesceShufflePartitions(session: SparkSession) 
extends CustomShuffl
 val specsMap = shuffleStageInfos.zip(newPartitionSpecs).map { case 
(stageInfo, partSpecs) =>
   (stageInfo.shuffleStage.id, partSpecs)
 }.toMap
-updateShuffleReaders(plan, specsMap)
+updateShuffleRead(plan, specsMap)
   } else {
 plan
   }
 }
   }
 
-  private def updateShuffleReaders(
+  private def updateShuffleRead(

Review comment:
   Like the other places, `updateShuffleRead` -> `updateShuffleReads`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33409: [SPARK-36201][SQL] Schema check should check inner field too

2021-07-19 Thread GitBox



AmplabJenkins removed a comment on pull request #33409:
URL: https://github.com/apache/spark/pull/33409#issuecomment-883051150


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45805/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33427: [SPARK-36216][PYTHON][TESTS] Increase timeout for StreamingLinearRegressionWithTests. test_parameter_convergence

2021-07-19 Thread GitBox



AmplabJenkins removed a comment on pull request #33427:
URL: https://github.com/apache/spark/pull/33427#issuecomment-883051152


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45804/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox



SparkQA commented on pull request #33239:
URL: https://github.com/apache/spark/pull/33239#issuecomment-883052445


   **[Test build #141298 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141298/testReport)**
 for PR 33239 at commit 
[`bccc98b`](https://github.com/apache/spark/commit/bccc98b7f3afad110ac450c183e341938dd20bc9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox



SparkQA commented on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883052316


   **[Test build #141297 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141297/testReport)**
 for PR 33429 at commit 
[`f2b0bab`](https://github.com/apache/spark/commit/f2b0babd5835d10ee894943b49def6a9cb01fcad).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33430: [SPARK-36046][SQL][FOLLOWUP] Implement prettyName for MakeTimestampNTZ and MakeTimestampLTZ

2021-07-19 Thread GitBox



SparkQA commented on pull request #33430:
URL: https://github.com/apache/spark/pull/33430#issuecomment-883052246


   **[Test build #141296 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141296/testReport)**
 for PR 33430 at commit 
[`955ecf6`](https://github.com/apache/spark/commit/955ecf6421b25244ad647a61a53998948064b451).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33431: [SPARK-36221][SQL] Make sure CustomShuffleReaderExec has at least one partition

2021-07-19 Thread GitBox



SparkQA commented on pull request #33431:
URL: https://github.com/apache/spark/pull/33431#issuecomment-883052250


   **[Test build #141295 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141295/testReport)**
 for PR 33431 at commit 
[`26bb39a`](https://github.com/apache/spark/commit/26bb39aea4d606fbe52d09ce51cc6b62fa775e6f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33427: [SPARK-36216][PYTHON][TESTS] Increase timeout for StreamingLinearRegressionWithTests. test_parameter_convergence

2021-07-19 Thread GitBox



AmplabJenkins commented on pull request #33427:
URL: https://github.com/apache/spark/pull/33427#issuecomment-883051152


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45804/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33409: [SPARK-36201][SQL] Schema check should check inner field too

2021-07-19 Thread GitBox



AmplabJenkins commented on pull request #33409:
URL: https://github.com/apache/spark/pull/33409#issuecomment-883051150


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45805/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package

2021-07-19 Thread GitBox



SparkQA commented on pull request #33350:
URL: https://github.com/apache/spark/pull/33350#issuecomment-883050392


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45807/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tobiasedwards commented on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox



tobiasedwards commented on pull request #33428:
URL: https://github.com/apache/spark/pull/33428#issuecomment-883049975


   Whoops I think I've messed up my rebase, give me a minute


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox



HyukjinKwon commented on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883048266


   cc @ulysses-you too FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox



HyukjinKwon commented on a change in pull request #33422:
URL: https://github.com/apache/spark/pull/33422#discussion_r672800804



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Observation.scala
##
@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.util.UUID
+
+import org.apache.spark.sql.execution.QueryExecution
+import org.apache.spark.sql.util.QueryExecutionListener
+
+
+/**
+ * Helper class to simplify usage of `Dataset.observe(String, Column, 
Column*)`:
+ *
+ * {{{
+ *   // Observe row count (rows) and highest id (maxid) in the Dataset while 
writing it
+ *   val observation = Observation("my metrics")
+ *   val observed_ds = ds.observe(observation, count(lit(1)).as("rows"), 
max($"id").as("maxid"))
+ *   observed_ds.write.parquet("ds.parquet")
+ *   val metrics = observation.get
+ * }}}
+ *
+ * This collects the metrics while the first action is executed on the 
observed dataset. Subsequent
+ * actions do not modify the metrics returned by [[get]]. Retrieval of the 
metric via [[get]]
+ * blocks until the first action has finished and metrics become available.
+ *
+ * This class does not support streaming datasets.
+ *
+ * @param name name of the metric
+ * @since 3.3.0
+ */
+class Observation(name: String) {
+
+  private val listener: ObservationListener = ObservationListener(this)
+
+  @volatile private var sparkSession: Option[SparkSession] = None
+
+  @volatile private var row: Option[Row] = None
+
+  /**
+   * Attach this observation to the given [[Dataset]] to observe aggregation 
expressions.
+   *
+   * @param ds dataset
+   * @param expr first aggregation expression
+   * @param exprs more aggregation expressions
+   * @tparam T dataset type
+   * @return observed dataset
+   * @throws IllegalArgumentException If this is a streaming Dataset 
(ds.isStreaming == true)
+   */
+  private[spark] def on[T](ds: Dataset[T], expr: Column, exprs: Column*): 
Dataset[T] = {
+if (ds.isStreaming) {
+  throw new IllegalArgumentException("Observation does not support 
streaming Datasets")
+}
+register(ds.sparkSession)
+ds.observe(name, expr, exprs: _*)
+  }
+
+  /**
+   * Get the observed metrics. This waits for the observed dataset to finish 
its first action.
+   * Only the result of the first action is available. Subsequent actions do 
not modify the result.
+   *
+   * @return the observed metrics as a [[Row]]
+   * @throws InterruptedException interrupted while waiting
+   */
+  @throws[InterruptedException]
+  def get: Row = {
+synchronized {
+  // we need to loop as wait might return without us calling notify
+  // 
https://en.wikipedia.org/w/index.php?title=Spurious_wakeup=992601610
+  while (this.row.isEmpty) {
+wait()
+  }
+}
+
+this.row.get
+  }
+
+  private def register(sparkSession: SparkSession): Unit = {
+// makes this class thread-safe:
+// only the first thread entering this block can set sparkSession
+// all other threads will see the exception, as it is only allowed to do 
this once
+synchronized {
+  if (this.sparkSession.isDefined) {
+throw new IllegalArgumentException("An Observation can be used with a 
Dataset only once")
+  }
+  this.sparkSession = Some(sparkSession)
+}
+
+sparkSession.listenerManager.register(this.listener)
+  }
+
+  private def unregister(): Unit = {
+this.sparkSession.foreach(_.listenerManager.unregister(this.listener))
+  }
+
+  private[spark] def onFinish(qe: QueryExecution): Unit = {
+synchronized {
+  if (this.row.isEmpty) {
+this.row = qe.observedMetrics.get(name)
+if (this.row.isDefined) {
+  notifyAll()
+  unregister()
+}
+  }
+}
+  }
+
+}
+
+private[sql] case class ObservationListener(observation: Observation)
+  extends QueryExecutionListener {
+
+  override def onSuccess(funcName: String, qe: QueryExecution, durationNs: 
Long): Unit =
+observation.onFinish(qe)
+
+  override def onFailure(funcName: String, qe: QueryExecution, exception: 
Exception): Unit =
+observation.onFinish(qe)
+
+}
+
+/**
+ * (Scala-specific) Create a named or anonymous

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox



HyukjinKwon commented on a change in pull request #33422:
URL: https://github.com/apache/spark/pull/33422#discussion_r672800470



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Observation.scala
##
@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.util.UUID
+
+import org.apache.spark.sql.execution.QueryExecution
+import org.apache.spark.sql.util.QueryExecutionListener
+
+
+/**
+ * Helper class to simplify usage of `Dataset.observe(String, Column, 
Column*)`:
+ *
+ * {{{
+ *   // Observe row count (rows) and highest id (maxid) in the Dataset while 
writing it
+ *   val observation = Observation("my metrics")
+ *   val observed_ds = ds.observe(observation, count(lit(1)).as("rows"), 
max($"id").as("maxid"))
+ *   observed_ds.write.parquet("ds.parquet")
+ *   val metrics = observation.get
+ * }}}
+ *
+ * This collects the metrics while the first action is executed on the 
observed dataset. Subsequent
+ * actions do not modify the metrics returned by [[get]]. Retrieval of the 
metric via [[get]]
+ * blocks until the first action has finished and metrics become available.
+ *
+ * This class does not support streaming datasets.
+ *
+ * @param name name of the metric
+ * @since 3.3.0
+ */
+class Observation(name: String) {
+
+  private val listener: ObservationListener = ObservationListener(this)
+
+  @volatile private var sparkSession: Option[SparkSession] = None
+
+  @volatile private var row: Option[Row] = None
+
+  /**
+   * Attach this observation to the given [[Dataset]] to observe aggregation 
expressions.
+   *
+   * @param ds dataset
+   * @param expr first aggregation expression
+   * @param exprs more aggregation expressions
+   * @tparam T dataset type
+   * @return observed dataset
+   * @throws IllegalArgumentException If this is a streaming Dataset 
(ds.isStreaming == true)
+   */
+  private[spark] def on[T](ds: Dataset[T], expr: Column, exprs: Column*): 
Dataset[T] = {
+if (ds.isStreaming) {
+  throw new IllegalArgumentException("Observation does not support 
streaming Datasets")
+}
+register(ds.sparkSession)
+ds.observe(name, expr, exprs: _*)
+  }
+
+  /**
+   * Get the observed metrics. This waits for the observed dataset to finish 
its first action.
+   * Only the result of the first action is available. Subsequent actions do 
not modify the result.
+   *
+   * @return the observed metrics as a [[Row]]
+   * @throws InterruptedException interrupted while waiting
+   */
+  @throws[InterruptedException]
+  def get: Row = {
+synchronized {
+  // we need to loop as wait might return without us calling notify
+  // 
https://en.wikipedia.org/w/index.php?title=Spurious_wakeup=992601610
+  while (this.row.isEmpty) {
+wait()
+  }
+}
+
+this.row.get
+  }
+
+  private def register(sparkSession: SparkSession): Unit = {
+// makes this class thread-safe:
+// only the first thread entering this block can set sparkSession
+// all other threads will see the exception, as it is only allowed to do 
this once
+synchronized {
+  if (this.sparkSession.isDefined) {
+throw new IllegalArgumentException("An Observation can be used with a 
Dataset only once")
+  }
+  this.sparkSession = Some(sparkSession)
+}
+
+sparkSession.listenerManager.register(this.listener)
+  }
+
+  private def unregister(): Unit = {
+this.sparkSession.foreach(_.listenerManager.unregister(this.listener))
+  }
+
+  private[spark] def onFinish(qe: QueryExecution): Unit = {
+synchronized {
+  if (this.row.isEmpty) {
+this.row = qe.observedMetrics.get(name)
+if (this.row.isDefined) {
+  notifyAll()
+  unregister()
+}
+  }
+}
+  }
+
+}
+
+private[sql] case class ObservationListener(observation: Observation)
+  extends QueryExecutionListener {
+
+  override def onSuccess(funcName: String, qe: QueryExecution, durationNs: 
Long): Unit =
+observation.onFinish(qe)
+
+  override def onFailure(funcName: String, qe: QueryExecution, exception: 
Exception): Unit =
+observation.onFinish(qe)
+
+}
+
+/**
+ * (Scala-specific) Create a named or anonymous

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox



HyukjinKwon commented on a change in pull request #33422:
URL: https://github.com/apache/spark/pull/33422#discussion_r672800470



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Observation.scala
##
@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.util.UUID
+
+import org.apache.spark.sql.execution.QueryExecution
+import org.apache.spark.sql.util.QueryExecutionListener
+
+
+/**
+ * Helper class to simplify usage of `Dataset.observe(String, Column, 
Column*)`:
+ *
+ * {{{
+ *   // Observe row count (rows) and highest id (maxid) in the Dataset while 
writing it
+ *   val observation = Observation("my metrics")
+ *   val observed_ds = ds.observe(observation, count(lit(1)).as("rows"), 
max($"id").as("maxid"))
+ *   observed_ds.write.parquet("ds.parquet")
+ *   val metrics = observation.get
+ * }}}
+ *
+ * This collects the metrics while the first action is executed on the 
observed dataset. Subsequent
+ * actions do not modify the metrics returned by [[get]]. Retrieval of the 
metric via [[get]]
+ * blocks until the first action has finished and metrics become available.
+ *
+ * This class does not support streaming datasets.
+ *
+ * @param name name of the metric
+ * @since 3.3.0
+ */
+class Observation(name: String) {
+
+  private val listener: ObservationListener = ObservationListener(this)
+
+  @volatile private var sparkSession: Option[SparkSession] = None
+
+  @volatile private var row: Option[Row] = None
+
+  /**
+   * Attach this observation to the given [[Dataset]] to observe aggregation 
expressions.
+   *
+   * @param ds dataset
+   * @param expr first aggregation expression
+   * @param exprs more aggregation expressions
+   * @tparam T dataset type
+   * @return observed dataset
+   * @throws IllegalArgumentException If this is a streaming Dataset 
(ds.isStreaming == true)
+   */
+  private[spark] def on[T](ds: Dataset[T], expr: Column, exprs: Column*): 
Dataset[T] = {
+if (ds.isStreaming) {
+  throw new IllegalArgumentException("Observation does not support 
streaming Datasets")
+}
+register(ds.sparkSession)
+ds.observe(name, expr, exprs: _*)
+  }
+
+  /**
+   * Get the observed metrics. This waits for the observed dataset to finish 
its first action.
+   * Only the result of the first action is available. Subsequent actions do 
not modify the result.
+   *
+   * @return the observed metrics as a [[Row]]
+   * @throws InterruptedException interrupted while waiting
+   */
+  @throws[InterruptedException]
+  def get: Row = {
+synchronized {
+  // we need to loop as wait might return without us calling notify
+  // 
https://en.wikipedia.org/w/index.php?title=Spurious_wakeup=992601610
+  while (this.row.isEmpty) {
+wait()
+  }
+}
+
+this.row.get
+  }
+
+  private def register(sparkSession: SparkSession): Unit = {
+// makes this class thread-safe:
+// only the first thread entering this block can set sparkSession
+// all other threads will see the exception, as it is only allowed to do 
this once
+synchronized {
+  if (this.sparkSession.isDefined) {
+throw new IllegalArgumentException("An Observation can be used with a 
Dataset only once")
+  }
+  this.sparkSession = Some(sparkSession)
+}
+
+sparkSession.listenerManager.register(this.listener)
+  }
+
+  private def unregister(): Unit = {
+this.sparkSession.foreach(_.listenerManager.unregister(this.listener))
+  }
+
+  private[spark] def onFinish(qe: QueryExecution): Unit = {
+synchronized {
+  if (this.row.isEmpty) {
+this.row = qe.observedMetrics.get(name)
+if (this.row.isDefined) {
+  notifyAll()
+  unregister()
+}
+  }
+}
+  }
+
+}
+
+private[sql] case class ObservationListener(observation: Observation)
+  extends QueryExecutionListener {
+
+  override def onSuccess(funcName: String, qe: QueryExecution, durationNs: 
Long): Unit =
+observation.onFinish(qe)
+
+  override def onFailure(funcName: String, qe: QueryExecution, exception: 
Exception): Unit =
+observation.onFinish(qe)
+
+}
+
+/**
+ * (Scala-specific) Create a named or anonymous

[GitHub] [spark] HyukjinKwon commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox



HyukjinKwon commented on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883046619


   cc @cloud-fan and @maryannxue can you take a look please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox



viirya commented on pull request #33239:
URL: https://github.com/apache/spark/pull/33239#issuecomment-883045610


   @dongjoon-hyun and @gengliangwang Thanks for reviewing. Please take another 
look on the suggested change/new tests. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #30565: [WIP][SPARK-33625][SQL] Subexpression elimination for whole-stage codegen in Filter

2021-07-19 Thread GitBox



viirya commented on pull request #30565:
URL: https://github.com/apache/spark/pull/30565#issuecomment-883045221


   I think it is much easier to solve it at query optimization (i.e. by the 
optimizer), instead of at codegen. It also looks like query optimization 
problem instead of codegen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #33430: [SPARK-36046][SQL][FOLLOWUP] Implement prettyName for MakeTimestampNTZ and MakeTimestampLTZ

2021-07-19 Thread GitBox



beliefer commented on pull request #33430:
URL: https://github.com/apache/spark/pull/33430#issuecomment-883043550


   > > This PR fix the incorrect alias usecase.
   > 
   > @beliefer I wouldn't say that is incorrect..implementing `prettyName` is 
more reliable.
   
   OK. I updated the description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on pull request #33430: [SPARK-36046][SQL][FOLLOWUP] Implement prettyName for MakeTimestampNTZ and MakeTimestampLTZ

2021-07-19 Thread GitBox



gengliangwang commented on pull request #33430:
URL: https://github.com/apache/spark/pull/33430#issuecomment-883042925


   > This PR fix the incorrect alias usecase.
   
   @beliefer I wouldn't say that is incorrect..implementing `prettyName` is 
more reliable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you opened a new pull request #33431: [SPARK-36221][SQL] Make sure CustomShuffleReaderExec has at least one partition

2021-07-19 Thread GitBox



ulysses-you opened a new pull request #33431:
URL: https://github.com/apache/spark/pull/33431


   
   
   ### What changes were proposed in this pull request?
   
   * Add non-empty partition check in `CustomShuffleReaderExec`
   * Make sure `OptimizeLocalShuffleReader` doesn't return empty partition
   
   ### Why are the changes needed?
   
   Since SPARK-32083, AQE coalesce always return at least one partition, it 
should be robust to add non-empty check in `CustomShuffleReaderExec`.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   not need


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon edited a comment on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox



HyukjinKwon edited a comment on pull request #33428:
URL: https://github.com/apache/spark/pull/33428#issuecomment-883039820


   ah actually this is the limitation .. I can;t retrigger the test because it 
belongs to your repo :-) .. can you rebase and push it again? e.g.) `git 
checkout python-sql-row-type-annotation && git fetch upstream && git rebase 
upstream master && git push origin python-sql-row-type-annotation`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox



HyukjinKwon commented on pull request #33428:
URL: https://github.com/apache/spark/pull/33428#issuecomment-883039820


   ah actually this is the limitation .. I can;t retrigger the test because it 
belongs to your repo :-) .. can you rebase and push it again? e.g.) `git 
checkout python-sql-row-type-annotation git fetch upstream && git rebase 
upstream master && git push origin python-sql-row-type-annotation`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #33427: [SPARK-36216][PYTHON][TESTS] Increase timeout for StreamingLinearRegressionWithTests. test_parameter_convergence

2021-07-19 Thread GitBox



HyukjinKwon closed pull request #33427:
URL: https://github.com/apache/spark/pull/33427


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33427: [SPARK-36216][PYTHON][TESTS] Increase timeout for StreamingLinearRegressionWithTests. test_parameter_convergence

2021-07-19 Thread GitBox



HyukjinKwon commented on pull request #33427:
URL: https://github.com/apache/spark/pull/33427#issuecomment-883038497


   Merged to master, branch-3.2, branch-3.1 and branch-3.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox



viirya commented on a change in pull request #33239:
URL: https://github.com/apache/spark/pull/33239#discussion_r672793126



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala
##
@@ -41,7 +42,8 @@ import org.apache.spark.util.SerializableConfiguration
 abstract class FileFormatDataWriter(
 description: WriteJobDescription,
 taskAttemptContext: TaskAttemptContext,
-committer: FileCommitProtocol) extends DataWriter[InternalRow] {
+committer: FileCommitProtocol,
+customMetrics: Map[String, SQLMetric]) extends DataWriter[InternalRow] {

Review comment:
   I added custom metric for writing to InMemory table for test purpose. 
The tests are in `FileFormatDataWriterMetricSuite`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer opened a new pull request #33430: [SPARK-36046][SQL][FOLLOWUP] Implement prettyName for MakeTimestampNTZ and MakeTimestampLTZ

2021-07-19 Thread GitBox



beliefer opened a new pull request #33430:
URL: https://github.com/apache/spark/pull/33430


   ### What changes were proposed in this pull request?
   This PR fix the incorrect use alias for `MakeTimestampNTZ` and 
`MakeTimestampLTZ` based on the discussion show below
   https://github.com/apache/spark/pull/33299/files#r668423810
   
   
   ### Why are the changes needed?
   This PR fix the incorrect alias usecase.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   Modifications are transparent to users.
   
   
   ### How was this patch tested?
   Jenkins test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33427: [SPARK-36216][PYTHON][TESTS] Increase timeout for StreamingLinearRegressionWithTests. test_parameter_convergence

2021-07-19 Thread GitBox



HyukjinKwon commented on pull request #33427:
URL: https://github.com/apache/spark/pull/33427#issuecomment-883038320


   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33427: [SPARK-36216][PYTHON][TESTS] Increase timeout for StreamingLinearRegressionWithTests. test_parameter_convergence

2021-07-19 Thread GitBox



SparkQA commented on pull request #33427:
URL: https://github.com/apache/spark/pull/33427#issuecomment-883038157


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45804/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33409: [SPARK-36201][SQL] Schema check should check inner field too

2021-07-19 Thread GitBox



SparkQA commented on pull request #33409:
URL: https://github.com/apache/spark/pull/33409#issuecomment-883037465


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45805/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox



SparkQA commented on pull request #33239:
URL: https://github.com/apache/spark/pull/33239#issuecomment-883036620


   **[Test build #141294 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141294/testReport)**
 for PR 33239 at commit 
[`fe9cc4e`](https://github.com/apache/spark/commit/fe9cc4e79323a8a089470fcb8b28b346fb96ecdd).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon opened a new pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader

2021-07-19 Thread GitBox



HyukjinKwon opened a new pull request #33429:
URL: https://github.com/apache/spark/pull/33429


   ### What changes were proposed in this pull request?
   
   This PR proposes to rename:
   
   - Rename `*Reader`/`*reader` to `*Read`/`*read` for rules and execution plan 
(user-facing doc/config name remain untouched)
 - `*ShuffleReaderExec` ->`*ShuffleReadExec`
 - `isLocalReader` -> `isLocalRead`
 - ...
   - Rename `CustomShuffle*` prefix to `AQEShuffle*`
   - Rename `OptimizeLocalShuffleReader` rule to `OptimizeShuffleWithLocalRead`
   
   ### Why are the changes needed?
   
   There are multiple problems in the current naming:
   
   - `CustomShuffle*` -> `AQEShuffle*`
   it sounds like it is a pluggable API. However, this is actually only 
used by AQE.
   - `OptimizeLocalShuffleReader` -> `OptimizeShuffleWithLocalRead`
   it is the name of a rule but it can be misread as a reader, which is 
counterintuative
   - `*ReaderExec` -> `*ReadExec`
   Reader execution reads a bit odd. It should better be read execution 
(like `ScanExec`, `ProjectExec` and `FilterExec`). I can't find the reason to 
name it with something that performs an action.
   
   ### Does this PR introduce _any_ user-facing change?
   
   ### How was this patch tested?
   
   Existing unittests should cover the changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package

2021-07-19 Thread GitBox



SparkQA commented on pull request #33350:
URL: https://github.com/apache/spark/pull/33350#issuecomment-883031312


   **[Test build #141293 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141293/testReport)**
 for PR 33350 at commit 
[`7473aea`](https://github.com/apache/spark/commit/7473aea9aa91586366206b7a01ed3b6e11f7236a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in

2021-07-19 Thread GitBox



SparkQA removed a comment on pull request #33078:
URL: https://github.com/apache/spark/pull/33078#issuecomment-882962666


   **[Test build #141286 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141286/testReport)**
 for PR 33078 at commit 
[`5310991`](https://github.com/apache/spark/commit/53109918cbdbdba2fe79f38a991c171efec7e85f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the sta

2021-07-19 Thread GitBox



AmplabJenkins removed a comment on pull request #33078:
URL: https://github.com/apache/spark/pull/33078#issuecomment-883030102


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141286/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33352: [SPARK-34952][SQL] DSv2 Aggregate push down APIs

2021-07-19 Thread GitBox



AmplabJenkins removed a comment on pull request #33352:
URL: https://github.com/apache/spark/pull/33352#issuecomment-883030099


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45806/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33427: [SPARK-36216][PYTHON][TESTS] Increase timeout for StreamingLinearRegressionWithTests. test_parameter_convergence

2021-07-19 Thread GitBox



AmplabJenkins removed a comment on pull request #33427:
URL: https://github.com/apache/spark/pull/33427#issuecomment-883030098


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141290/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cfmcgrady commented on a change in pull request #33212: [SPARK-35912][SQL] Fix nullability of `spark.read.json`

2021-07-19 Thread GitBox



cfmcgrady commented on a change in pull request #33212:
URL: https://github.com/apache/spark/pull/33212#discussion_r672787073



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
##
@@ -405,10 +405,18 @@ class JacksonParser(
   schema.getFieldIndex(parser.getCurrentName) match {
 case Some(index) =>
   try {
-row.update(index, fieldConverters(index).apply(parser))
+val fieldValue = fieldConverters(index).apply(parser)

Review comment:
   Thank you for your suggestions, I'll raise a new PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33352: [SPARK-34952][SQL] DSv2 Aggregate push down APIs

2021-07-19 Thread GitBox



AmplabJenkins commented on pull request #33352:
URL: https://github.com/apache/spark/pull/33352#issuecomment-883030099


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45806/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33427: [SPARK-36216][PYTHON][TESTS] Increase timeout for StreamingLinearRegressionWithTests. test_parameter_convergence

2021-07-19 Thread GitBox



AmplabJenkins commented on pull request #33427:
URL: https://github.com/apache/spark/pull/33427#issuecomment-883030098


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141290/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a

2021-07-19 Thread GitBox



AmplabJenkins commented on pull request #33078:
URL: https://github.com/apache/spark/pull/33078#issuecomment-883030102


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141286/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-19 Thread GitBox



gengliangwang commented on pull request #32401:
URL: https://github.com/apache/spark/pull/32401#issuecomment-883029244


   @Ngone51 Yes let's see if we can make it before 3.2.  Thanks for the work!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33427: [SPARK-36216][PYTHON][TESTS] Increase timeout for StreamingLinearRegressionWithTests. test_parameter_convergence

2021-07-19 Thread GitBox



HyukjinKwon commented on pull request #33427:
URL: https://github.com/apache/spark/pull/33427#issuecomment-883028518


   test failures in GA should be unrelated. @dongjoon-hyun, mind taking a quick 
look please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33427: [SPARK-36216][PYTHON][TESTS] Increase timeout for StreamingLinearRegressionWithTests. test_parameter_convergence

2021-07-19 Thread GitBox



SparkQA commented on pull request #33427:
URL: https://github.com/apache/spark/pull/33427#issuecomment-883027144


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45804/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec

2021-07-19 Thread GitBox



yaooqinn commented on a change in pull request #33424:
URL: https://github.com/apache/spark/pull/33424#discussion_r672783259



##
File path: sql/core/src/test/resources/sql-tests/results/describe.sql.out
##
@@ -324,6 +324,37 @@ Location [not included in comparison]/{warehouse_dir}/t
 Storage Properties [a=1, b=2]
 
 
+-- !query
+DESC EXTENDED t PARTITION (C='Us', D=1)

Review comment:
   ```
   +-- !query
   +DESC EXTENDED t PARTITION (C='Us', D=1)
   +-- !query schema
   +struct<>
   +-- !query output
   +org.apache.spark.sql.AnalysisException
   +Partition spec is invalid. The spec (C, D) must match the partition spec 
(c, d) defined in table '`default`.`t`'
   +
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33409: [SPARK-36201][SQL] Schema check should check inner field too

2021-07-19 Thread GitBox



SparkQA commented on pull request #33409:
URL: https://github.com/apache/spark/pull/33409#issuecomment-883027098


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45805/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better

2021-07-19 Thread GitBox



SparkQA commented on pull request #33078:
URL: https://github.com/apache/spark/pull/33078#issuecomment-883026671


   **[Test build #141286 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141286/testReport)**
 for PR 33078 at commit 
[`5310991`](https://github.com/apache/spark/commit/53109918cbdbdba2fe79f38a991c171efec7e85f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `public class ShuffleChecksumHelper `
 * `class MutableCheckedOutputStream(out: OutputStream) extends 
OutputStream `
 * `case class ShuffleChecksumBlockId(shuffleId: Int, mapId: Long, 
reduceId: Int) extends BlockId `
 * `case class SessionWindow(timeColumn: Expression, gapDuration: Long) 
extends UnaryExpression`
 * `protected abstract class ConnectionProviderBase extends Logging `
 * `case class SessionWindowStateStoreRestoreExec(`
 * `case class SessionWindowStateStoreSaveExec(`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33352: [SPARK-34952][SQL] DSv2 Aggregate push down APIs

2021-07-19 Thread GitBox



SparkQA commented on pull request #33352:
URL: https://github.com/apache/spark/pull/33352#issuecomment-883026630


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45806/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tobiasedwards commented on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox



tobiasedwards commented on pull request #33428:
URL: https://github.com/apache/spark/pull/33428#issuecomment-883023465


   Hey @HyukjinKwon, I've added a Jira ticket here: 
[SPARK-36220](https://issues.apache.org/jira/browse/SPARK-36220) and enabled 
GitHub Actions on my forked repo. Is there anything I need to do to kick off 
the "Build and test" action again?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on a change in pull request #33188: [SPARK-35989][SQL] Only remove redundant shuffle if shuffle origin is REPARTITION_BY_COL in AQE

2021-07-19 Thread GitBox



ulysses-you commented on a change in pull request #33188:
URL: https://github.com/apache/spark/pull/33188#discussion_r672779057



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
##
@@ -250,7 +250,12 @@ object EnsureRequirements extends Rule[SparkPlan] {
 
   def apply(plan: SparkPlan): SparkPlan = plan.transformUp {
 // TODO: remove this after we create a physical operator for 
`RepartitionByExpression`.
-case operator @ ShuffleExchangeExec(upper: HashPartitioning, child, _) =>
+// SPARK-35989: AQE will change the partition number so we should retain 
the REPARTITION_BY_NUM
+// shuffle which is specified by user. And also we can not remove 
REBALANCE_PARTITIONS_BY_COL,
+// it is a special shuffle used to rebalance partitions.
+// So, here we only remove REPARTITION_BY_COL in AQE.
+case operator @ ShuffleExchangeExec(upper: HashPartitioning, child, 
shuffleOrigin)
+if shuffleOrigin == REPARTITION_BY_COL || 
!conf.adaptiveExecutionEnabled =>

Review comment:
   yeah, as we have only skipped applying `CoalesceShufflePartitions` or 
other custom shuffle reader at final stage. But for the stages which are in the 
process, we do nothing. That's why I think a little bit hack.
   
   One other hack idea is we can remark the shuffle which is before the removed 
shuffle and change the `ENSURE_REQUIREMENTS` to `REPARTITION_BY_COL`. Then in 
AQE, we can do optimization safely.
   
   IMO, I prefer the idea of `skip removing shuffle with all shuffle origin in 
AQE`, it's simple and it can be seen as a behavior change due to AQE is enabled 
by default. If user really hit this issue, they can just disable AQE.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec

2021-07-19 Thread GitBox



cloud-fan commented on a change in pull request #33424:
URL: https://github.com/apache/spark/pull/33424#discussion_r672778569



##
File path: sql/core/src/test/resources/sql-tests/results/describe.sql.out
##
@@ -324,6 +324,37 @@ Location [not included in comparison]/{warehouse_dir}/t
 Storage Properties [a=1, b=2]
 
 
+-- !query
+DESC EXTENDED t PARTITION (C='Us', D=1)

Review comment:
   what was the result  before this PR?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox



cloud-fan commented on a change in pull request #33422:
URL: https://github.com/apache/spark/pull/33422#discussion_r672777257



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Observation.scala
##
@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.util.UUID
+
+import org.apache.spark.sql.execution.QueryExecution
+import org.apache.spark.sql.util.QueryExecutionListener
+
+
+/**
+ * Helper class to simplify usage of `Dataset.observe(String, Column, 
Column*)`:
+ *
+ * {{{
+ *   // Observe row count (rows) and highest id (maxid) in the Dataset while 
writing it
+ *   val observation = Observation("my metrics")
+ *   val observed_ds = ds.observe(observation, count(lit(1)).as("rows"), 
max($"id").as("maxid"))
+ *   observed_ds.write.parquet("ds.parquet")
+ *   val metrics = observation.get
+ * }}}
+ *
+ * This collects the metrics while the first action is executed on the 
observed dataset. Subsequent
+ * actions do not modify the metrics returned by [[get]]. Retrieval of the 
metric via [[get]]
+ * blocks until the first action has finished and metrics become available.
+ *
+ * This class does not support streaming datasets.
+ *
+ * @param name name of the metric
+ * @since 3.3.0
+ */
+class Observation(name: String) {
+
+  private val listener: ObservationListener = ObservationListener(this)
+
+  @volatile private var sparkSession: Option[SparkSession] = None
+
+  @volatile private var row: Option[Row] = None
+
+  /**
+   * Attach this observation to the given [[Dataset]] to observe aggregation 
expressions.
+   *
+   * @param ds dataset
+   * @param expr first aggregation expression
+   * @param exprs more aggregation expressions
+   * @tparam T dataset type
+   * @return observed dataset
+   * @throws IllegalArgumentException If this is a streaming Dataset 
(ds.isStreaming == true)
+   */
+  private[spark] def on[T](ds: Dataset[T], expr: Column, exprs: Column*): 
Dataset[T] = {
+if (ds.isStreaming) {
+  throw new IllegalArgumentException("Observation does not support 
streaming Datasets")
+}
+register(ds.sparkSession)
+ds.observe(name, expr, exprs: _*)
+  }
+
+  /**
+   * Get the observed metrics. This waits for the observed dataset to finish 
its first action.
+   * Only the result of the first action is available. Subsequent actions do 
not modify the result.
+   *
+   * @return the observed metrics as a [[Row]]
+   * @throws InterruptedException interrupted while waiting
+   */
+  @throws[InterruptedException]
+  def get: Row = {
+synchronized {
+  // we need to loop as wait might return without us calling notify
+  // 
https://en.wikipedia.org/w/index.php?title=Spurious_wakeup=992601610
+  while (this.row.isEmpty) {
+wait()
+  }
+}
+
+this.row.get
+  }
+
+  private def register(sparkSession: SparkSession): Unit = {
+// makes this class thread-safe:
+// only the first thread entering this block can set sparkSession
+// all other threads will see the exception, as it is only allowed to do 
this once
+synchronized {
+  if (this.sparkSession.isDefined) {
+throw new IllegalArgumentException("An Observation can be used with a 
Dataset only once")
+  }
+  this.sparkSession = Some(sparkSession)
+}
+
+sparkSession.listenerManager.register(this.listener)
+  }
+
+  private def unregister(): Unit = {
+this.sparkSession.foreach(_.listenerManager.unregister(this.listener))
+  }
+
+  private[spark] def onFinish(qe: QueryExecution): Unit = {
+synchronized {
+  if (this.row.isEmpty) {
+this.row = qe.observedMetrics.get(name)
+if (this.row.isDefined) {
+  notifyAll()
+  unregister()
+}
+  }
+}
+  }
+
+}
+
+private[sql] case class ObservationListener(observation: Observation)
+  extends QueryExecutionListener {
+
+  override def onSuccess(funcName: String, qe: QueryExecution, durationNs: 
Long): Unit =
+observation.onFinish(qe)
+
+  override def onFailure(funcName: String, qe: QueryExecution, exception: 
Exception): Unit =
+observation.onFinish(qe)
+
+}
+
+/**
+ * (Scala-specific) Create a named or anonymous

[GitHub] [spark] SparkQA removed a comment on pull request #33427: [SPARK-36216][PYTHON][TESTS] Increase timeout for StreamingLinearRegressionWithTests. test_parameter_convergence

2021-07-19 Thread GitBox



SparkQA removed a comment on pull request #33427:
URL: https://github.com/apache/spark/pull/33427#issuecomment-883011237


   **[Test build #141290 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141290/testReport)**
 for PR 33427 at commit 
[`ad91d63`](https://github.com/apache/spark/commit/ad91d639cb8c3ede32d24db2703c35354c24617d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33427: [SPARK-36216][PYTHON][TESTS] Increase timeout for StreamingLinearRegressionWithTests. test_parameter_convergence

2021-07-19 Thread GitBox



SparkQA commented on pull request #33427:
URL: https://github.com/apache/spark/pull/33427#issuecomment-883019062


   **[Test build #141290 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141290/testReport)**
 for PR 33427 at commit 
[`ad91d63`](https://github.com/apache/spark/commit/ad91d639cb8c3ede32d24db2703c35354c24617d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 983 matches

Mail list logo