date:20210808

[GitHub] [spark] SparkQA removed a comment on pull request #33665: [SPARK-36428][SQL] the seconds parameter of make_timestamp should accept integer type

2021-08-08 Thread GitBox



SparkQA removed a comment on pull request #33665:
URL: https://github.com/apache/spark/pull/33665#issuecomment-894936793


   **[Test build #142206 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142206/testReport)**
 for PR 33665 at commit 
[`3496a4d`](https://github.com/apache/spark/commit/3496a4dfa53d3ab9cddf8c085c61d8b86757eda5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #33683: [SPARK-36041][SS][DOCS] Introduce the RocksDBStateStoreProvider in the programming guide

2021-08-08 Thread GitBox



SparkQA removed a comment on pull request #33683:
URL: https://github.com/apache/spark/pull/33683#issuecomment-894972160


   **[Test build #142209 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142209/testReport)**
 for PR 33683 at commit 
[`09f6aeb`](https://github.com/apache/spark/commit/09f6aeb529ee390b2e6c61c9e780fe41b89fc41c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #33682: [WIP][SPARK-36456][CORE][SQL][STRUCTURED STREAMING] Clean up compilation warnings related to `method closeQuietly in class IOUtils is

2021-08-08 Thread GitBox



SparkQA removed a comment on pull request #33682:
URL: https://github.com/apache/spark/pull/33682#issuecomment-894936706


   **[Test build #142205 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142205/testReport)**
 for PR 33682 at commit 
[`6bd69d0`](https://github.com/apache/spark/commit/6bd69d05a5d298ba664ef8a46fe98e4b6e6736f8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33665: [SPARK-36428][SQL] the seconds parameter of make_timestamp should accept integer type

2021-08-08 Thread GitBox



SparkQA commented on pull request #33665:
URL: https://github.com/apache/spark/pull/33665#issuecomment-894991717


   **[Test build #142206 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142206/testReport)**
 for PR 33665 at commit 
[`3496a4d`](https://github.com/apache/spark/commit/3496a4dfa53d3ab9cddf8c085c61d8b86757eda5).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on pull request #33686: [SPARK-36086][SQL] CollapseProject project replace alias should use origin column name

2021-08-08 Thread GitBox



AngersZh commented on pull request #33686:
URL: https://github.com/apache/spark/pull/33686#issuecomment-894987234


   FYI @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33682: [WIP][SPARK-36456][CORE][SQL][STRUCTURED STREAMING] Clean up compilation warnings related to `method closeQuietly in class IOUtils is depreca

2021-08-08 Thread GitBox



SparkQA commented on pull request #33682:
URL: https://github.com/apache/spark/pull/33682#issuecomment-894986988


   **[Test build #142205 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142205/testReport)**
 for PR 33682 at commit 
[`6bd69d0`](https://github.com/apache/spark/commit/6bd69d05a5d298ba664ef8a46fe98e4b6e6736f8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class BlockSavedOnDecommissionedBlockManagerException(blockId: BlockId)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on pull request #33685: [SPARK-36086][SQL] CollapseProject project replace alias should use origin column name

2021-08-08 Thread GitBox



AngersZh commented on pull request #33685:
URL: https://github.com/apache/spark/pull/33685#issuecomment-894985919


   FYI @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu opened a new pull request #33686: [SPARK-36086][SQL] CollapseProject project replace alias should use origin column name

2021-08-08 Thread GitBox



AngersZh opened a new pull request #33686:
URL: https://github.com/apache/spark/pull/33686


   ### What changes were proposed in this pull request?
   For added UT, without this patch will failed as below
   ```
   [info] - SHOW TABLES V2: SPARK-36086: CollapseProject project replace alias 
should use origin column name *** FAILED *** (4 seconds, 935 milliseconds)
   [info]   java.lang.RuntimeException: After applying rule 
org.apache.spark.sql.catalyst.optimizer.CollapseProject in batch Operator 
Optimization before Inferring Filters, the structural integrity of the plan is 
broken.
   [info]   at 
org.apache.spark.sql.errors.QueryExecutionErrors$.structuralIntegrityIsBrokenAfterApplyingRuleError(QueryExecutionErrors.scala:1217)
   [info]   at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:229)
   [info]   at 
scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
   [info]   at 
scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
   [info]   at scala.collection.immutable.List.foldLeft(List.scala:91)
   [info]   at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
   [info]   at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
   [info]   at scala.collection.immutable.List.foreach(List.scala:431)
   [info]   at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
   [info]   at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
   [info]   at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
   ```
   
   CollapseProject project replace alias should use origin column name
   ### Why are the changes needed?
   Fix bug
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



HeartSaVioR commented on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894985611


   retest this, please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #33683: [SPARK-36041][SS][DOCS] Introduce the RocksDBStateStoreProvider in the programming guide

2021-08-08 Thread GitBox



HeartSaVioR commented on a change in pull request #33683:
URL: https://github.com/apache/spark/pull/33683#discussion_r684946042



##
File path: docs/structured-streaming-programming-guide.md
##
@@ -1814,6 +1814,23 @@ Specifically for built-in HDFS state store provider, 
users can check the state s
 it is best if cache missing count is minimized that means Spark won't waste 
too much time on loading checkpointed state.
 User can increase Spark locality waiting configurations to avoid loading state 
store providers in different executors across batches.
 
+### RocksDB state store implementation
+
+As of Spark 3.2, we add a new build-in state store implementation, RocksDB 
state store provider.
+
+The current build-in HDFS state store provider has two major drawbacks:
+
+* The amount of state that can be maintained is limited by the heap size of 
the executors
+* State expiration by watermark and/or timeouts require full scans over all 
the data
+
+The RocksDB-based State Store implementation can address these drawbacks:
+
+* RocksDB can serve data from the disk with a configurable amount of non-JVM 
memory.
+* Sorting keys using the appropriate column should avoid full scans to find 
the to-be-dropped keys.

Review comment:
   Please correct me if I'm missing; while this could be something we can 
evaluate and address, this is not true at least for now. We don't distinguish 
event time field in state store.
   
   Prefix scan is the only thing we leverage sorted key for now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu opened a new pull request #33685: [SPARK-36086][SQL] CollapseProject project replace alias should use origin column name

2021-08-08 Thread GitBox



AngersZh opened a new pull request #33685:
URL: https://github.com/apache/spark/pull/33685


   ### What changes were proposed in this pull request?
   For added UT, without this patch will failed as below
   ```
   [info] - SHOW TABLES V2: SPARK-36086: CollapseProject project replace alias 
should use origin column name *** FAILED *** (4 seconds, 935 milliseconds)
   [info]   java.lang.RuntimeException: After applying rule 
org.apache.spark.sql.catalyst.optimizer.CollapseProject in batch Operator 
Optimization before Inferring Filters, the structural integrity of the plan is 
broken.
   [info]   at 
org.apache.spark.sql.errors.QueryExecutionErrors$.structuralIntegrityIsBrokenAfterApplyingRuleError(QueryExecutionErrors.scala:1217)
   [info]   at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:229)
   [info]   at 
scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
   [info]   at 
scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
   [info]   at scala.collection.immutable.List.foldLeft(List.scala:91)
   [info]   at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
   [info]   at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
   [info]   at scala.collection.immutable.List.foreach(List.scala:431)
   [info]   at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
   [info]   at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
   [info]   at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
   ```
   
   CollapseProject project replace alias should use origin column name
   ### Why are the changes needed?
   Fix bug
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33684: [WIP][SPARK-36429][SQL] JacksonParser should throw exception when data type unsupported.

2021-08-08 Thread GitBox



SparkQA commented on pull request #33684:
URL: https://github.com/apache/spark/pull/33684#issuecomment-894979263


   **[Test build #142210 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142210/testReport)**
 for PR 33684 at commit 
[`4f0df82`](https://github.com/apache/spark/commit/4f0df82fd0c391944d754d3ff72ea0681e024d31).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



HeartSaVioR commented on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894978776


   cc. @viirya @xuanyuanking 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer opened a new pull request #33684: [WIP][SPARK-36429][SQL] JacksonParser should throw exception when data type unsupported.

2021-08-08 Thread GitBox



beliefer opened a new pull request #33684:
URL: https://github.com/apache/spark/pull/33684


   ### What changes were proposed in this pull request?
   Currently, when `set spark.sql.timestampType=TIMESTAMP_NTZ`, the behavior is 
different between `from_json` and `from_csv`.
   ```
   -- !query
   select from_json('{"t":"26/October/2015"}', 't Timestamp', 
map('timestampFormat', 'dd/M/'))
   -- !query schema
   struct>
   -- !query output
   {"t":null}
   ```
   
   ```
   -- !query
   select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', 
'dd/M/'))
   -- !query schema
   struct<>
   -- !query output
   java.lang.Exception
   Unsupported type: timestamp_ntz
   ```
   
   We should make `from_json` throws exception too.
   This PR fix the discussion below
   https://github.com/apache/spark/pull/33640#discussion_r682862523
   
   
   ### Why are the changes needed?
   Make the behavior of `from_json` more reasonable.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'Yes'.
   from_json throwing Exception when we set 
spark.sql.timestampType=TIMESTAMP_NTZ.
   
   
   ### How was this patch tested?
   Tests updated.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33683: [SPARK-36041][SS][DOCS] Introduce the RocksDBStateStoreProvider in the programming guide

2021-08-08 Thread GitBox



AmplabJenkins commented on pull request #33683:
URL: https://github.com/apache/spark/pull/33683#issuecomment-894976783


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142209/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33683: [SPARK-36041][SS][DOCS] Introduce the RocksDBStateStoreProvider in the programming guide

2021-08-08 Thread GitBox



SparkQA commented on pull request #33683:
URL: https://github.com/apache/spark/pull/33683#issuecomment-894976646


   **[Test build #142209 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142209/testReport)**
 for PR 33683 at commit 
[`09f6aeb`](https://github.com/apache/spark/commit/09f6aeb529ee390b2e6c61c9e780fe41b89fc41c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



SparkQA commented on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894976016


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46721/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



SparkQA commented on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894973226


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46720/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33683: [SPARK-36041][SS][DOCS] Introduce the RocksDBStateStoreProvider in the programming guide

2021-08-08 Thread GitBox



SparkQA commented on pull request #33683:
URL: https://github.com/apache/spark/pull/33683#issuecomment-894972160


   **[Test build #142209 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142209/testReport)**
 for PR 33683 at commit 
[`09f6aeb`](https://github.com/apache/spark/commit/09f6aeb529ee390b2e6c61c9e780fe41b89fc41c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking commented on pull request #33683: [SPARK-36041][SS][DOCS] Introduce the RocksDBStateStoreProvider in the programming guide

2021-08-08 Thread GitBox



xuanyuanking commented on pull request #33683:
URL: https://github.com/apache/spark/pull/33683#issuecomment-894972015


   cc @HeartSaVioR 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking opened a new pull request #33683: [SPARK-36041][SS][DOCS] Introduce the RocksDBStateStoreProvider in the programming guide

2021-08-08 Thread GitBox



xuanyuanking opened a new pull request #33683:
URL: https://github.com/apache/spark/pull/33683


   ### What changes were proposed in this pull request?
   Add the document for the new RocksDBStateStoreProvider.
   
   ### Why are the changes needed?
   User guide for the new feature.
   
   ### Does this PR introduce _any_ user-facing change?
   No, doc only.
   
   ### How was this patch tested?
   Doc only.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33646: [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3

2021-08-08 Thread GitBox



AmplabJenkins removed a comment on pull request #33646:
URL: https://github.com/apache/spark/pull/33646#issuecomment-894907498






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33646: [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3

2021-08-08 Thread GitBox



AmplabJenkins commented on pull request #33646:
URL: https://github.com/apache/spark/pull/33646#issuecomment-894971483


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46719/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33646: [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3

2021-08-08 Thread GitBox



SparkQA commented on pull request #33646:
URL: https://github.com/apache/spark/pull/33646#issuecomment-894971443


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46719/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



AmplabJenkins removed a comment on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894923739






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33682: [WIP][SPARK-36456][CORE][SQL][STRUCTURED STREAMING] Clean up compilation warnings related to `method closeQuietly in class IOUt

2021-08-08 Thread GitBox



AmplabJenkins removed a comment on pull request #33682:
URL: https://github.com/apache/spark/pull/33682#issuecomment-894970697


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46717/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33682: [WIP][SPARK-36456][CORE][SQL][STRUCTURED STREAMING] Clean up compilation warnings related to `method closeQuietly in class IOUtils is d

2021-08-08 Thread GitBox



AmplabJenkins commented on pull request #33682:
URL: https://github.com/apache/spark/pull/33682#issuecomment-894970697


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46717/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



AmplabJenkins commented on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894970696


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142208/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33682: [WIP][SPARK-36456][CORE][SQL][STRUCTURED STREAMING] Clean up compilation warnings related to `method closeQuietly in class IOUtils is depreca

2021-08-08 Thread GitBox



SparkQA commented on pull request #33682:
URL: https://github.com/apache/spark/pull/33682#issuecomment-894968632


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46717/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SaurabhChawla100 commented on pull request #33679: [SPARK-36452][SQL]: Add the support in Spark for having group by map datatype column for the scenario that works in Hive

2021-08-08 Thread GitBox



SaurabhChawla100 commented on pull request #33679:
URL: https://github.com/apache/spark/pull/33679#issuecomment-894968492


   > I thought @maropu is still working on this? (#32552)
   
   I was not aware, that there is already a jira for this map issue, Yes this 
PR (https://github.com/apache/spark/pull/32552) will fix the use case that I am 
trying in this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SaurabhChawla100 commented on a change in pull request #33679: [SPARK-36452][SQL]: Add the support in Spark for having group by map datatype column for the scenario that works in Hive

2021-08-08 Thread GitBox



SaurabhChawla100 commented on a change in pull request #33679:
URL: https://github.com/apache/spark/pull/33679#discussion_r684924662



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ordering.scala
##
@@ -97,13 +97,18 @@ object InterpretedOrdering {
 object RowOrdering extends 
CodeGeneratorWithInterpretedFallback[Seq[SortOrder], BaseOrdering] {
 
   /**
-   * Returns true iff the data type can be ordered (i.e. can be sorted).
+   * Returns true if the data type can be ordered (i.e. can be sorted).
*/
-  def isOrderable(dataType: DataType): Boolean = dataType match {
+  def isOrderable(dataType: DataType,

Review comment:
   @HyukjinKwon - Thanks for checking this PR. Yes we can wait for this PR 
https://github.com/apache/spark/pull/32552. The fix in this will work with 
group by, order by , partition by in window. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



SparkQA removed a comment on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894959813


   **[Test build #142208 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142208/testReport)**
 for PR 33681 at commit 
[`2d2d67f`](https://github.com/apache/spark/commit/2d2d67f1db83e88155162f990832bd37e8fef714).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



SparkQA commented on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894964222


   **[Test build #142208 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142208/testReport)**
 for PR 33681 at commit 
[`2d2d67f`](https://github.com/apache/spark/commit/2d2d67f1db83e88155162f990832bd37e8fef714).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33665: [SPARK-36428][SQL] the seconds parameter of make_timestamp should accept integer type

2021-08-08 Thread GitBox



AmplabJenkins commented on pull request #33665:
URL: https://github.com/apache/spark/pull/33665#issuecomment-894960812


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46718/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33665: [SPARK-36428][SQL] the seconds parameter of make_timestamp should accept integer type

2021-08-08 Thread GitBox



SparkQA commented on pull request #33665:
URL: https://github.com/apache/spark/pull/33665#issuecomment-894960783


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46718/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



SparkQA commented on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894959813


   **[Test build #142208 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142208/testReport)**
 for PR 33681 at commit 
[`2d2d67f`](https://github.com/apache/spark/commit/2d2d67f1db83e88155162f990832bd37e8fef714).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] c21 commented on pull request #33679: [SPARK-36452][SQL]: Add the support in Spark for having group by map datatype column for the scenario that works in Hive

2021-08-08 Thread GitBox



c21 commented on pull request #33679:
URL: https://github.com/apache/spark/pull/33679#issuecomment-894958699


   I thought @maropu is still working on this? 
(https://github.com/apache/spark/pull/32552)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



HeartSaVioR commented on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894958524


   Addressed Java port. This is now ready to review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] c21 removed a comment on pull request #33680: [SPARK-36454][SQL] Not push down partition filter to ORCScan for DSv2

2021-08-08 Thread GitBox



c21 removed a comment on pull request #33680:
URL: https://github.com/apache/spark/pull/33680#issuecomment-894957859


   LTGM as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] c21 commented on pull request #33680: [SPARK-36454][SQL] Not push down partition filter to ORCScan for DSv2

2021-08-08 Thread GitBox



c21 commented on pull request #33680:
URL: https://github.com/apache/spark/pull/33680#issuecomment-894957859


   LTGM as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33646: [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3

2021-08-08 Thread GitBox



SparkQA commented on pull request #33646:
URL: https://github.com/apache/spark/pull/33646#issuecomment-894955039


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46719/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33646: [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3

2021-08-08 Thread GitBox



AmplabJenkins commented on pull request #33646:
URL: https://github.com/apache/spark/pull/33646#issuecomment-894953783


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142207/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33682: [WIP][SPARK-36456][CORE][SQL] Clean up compilation warnings related to `method closeQuietly in class IOUtils is deprecated`

2021-08-08 Thread GitBox



SparkQA commented on pull request #33682:
URL: https://github.com/apache/spark/pull/33682#issuecomment-894953320


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46717/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #33646: [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3

2021-08-08 Thread GitBox



SparkQA removed a comment on pull request #33646:
URL: https://github.com/apache/spark/pull/33646#issuecomment-894936794


   **[Test build #142207 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142207/testReport)**
 for PR 33646 at commit 
[`a5c169a`](https://github.com/apache/spark/commit/a5c169a6dc3ca4ecdadae0beabac8565def7a4f8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33665: [SPARK-36428][SQL] the seconds parameter of make_timestamp should accept integer type

2021-08-08 Thread GitBox



SparkQA commented on pull request #33665:
URL: https://github.com/apache/spark/pull/33665#issuecomment-894948305


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46718/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33646: [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3

2021-08-08 Thread GitBox



SparkQA commented on pull request #33646:
URL: https://github.com/apache/spark/pull/33646#issuecomment-894945197


   **[Test build #142207 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142207/testReport)**
 for PR 33646 at commit 
[`a5c169a`](https://github.com/apache/spark/commit/a5c169a6dc3ca4ecdadae0beabac8565def7a4f8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] venkata91 commented on a change in pull request #33615: [SPARK-36374][SHUFFLE][DOC] Push-based shuffle high level user documentation

2021-08-08 Thread GitBox



venkata91 commented on a change in pull request #33615:
URL: https://github.com/apache/spark/pull/33615#discussion_r684896710



##
File path: docs/configuration.md
##
@@ -3134,3 +3134,111 @@ The stage level scheduling feature allows users to 
specify task and executor res
 This is only available for the RDD API in Scala, Java, and Python.  It is 
available on YARN and Kubernetes when dynamic allocation is enabled. See the 
[YARN](running-on-yarn.html#stage-level-scheduling-overview) page or 
[Kubernetes](running-on-kubernetes.html#stage-level-scheduling-overview) page 
for more implementation details.
 
 See the `RDD.withResources` and `ResourceProfileBuilder` API's for using this 
feature. The current implementation acquires new executors for each 
`ResourceProfile`  created and currently has to be an exact match. Spark does 
not try to fit tasks into an executor that require a different ResourceProfile 
than the executor was created with. Executors that are not in use will idle 
timeout with the dynamic allocation logic. The default configuration for this 
feature is to only allow one ResourceProfile per stage. If the user associates 
more then 1 ResourceProfile to an RDD, Spark will throw an exception by 
default. See config `spark.scheduler.resource.profileMergeConflicts` to control 
that behavior. The current merge strategy Spark implements when 
`spark.scheduler.resource.profileMergeConflicts` is enabled is a simple max of 
each resource within the conflicting ResourceProfiles. Spark will create a new 
ResourceProfile with the max of each of the resources.
+
+# Push-based shuffle overview
+
+Push based shuffle helps improve the reliability and performance of spark 
shuffle. It takes a best-effort approach to push the shuffle blocks generated 
by the map tasks to remote shuffle services to be merged per shuffle partition. 
Reduce tasks fetch a combination of merged shuffle partitions and original 
shuffle blocks as their input data, resulting in converting small random disk 
reads by shuffle services into large sequential reads. Possibility of better 
data locality for reduce tasks additionally helps minimize network IO.
+
+  Currently push-based shuffle is only supported for Spark on YARN with 
external shuffle service. 
+
+### Shuffle server side configuration options
+
+
+Property NameDefaultMeaningSince 
Version
+
+  spark.shuffle.push.server.mergedShuffleFileManagerImpl
+  
+
org.apache.spark.network.shuffle.ExternalBlockHandler$NoOpMergedShuffleFileManager

Review comment:
   We would still have the issue of `$` for the config value which is an 
issue. Let me file a PR to handle that. Looked at the other configs in the same 
page, it seems like just using line break (`` is how the config values 
are word wrapped earlier, so would follow the same approach. Also, I tried 
changing CSS for `td` to `word-wrap` seems to work fine, but it also changes 
the other tables which seems to be disruptive so won't go that route. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] venkata91 commented on a change in pull request #33615: [SPARK-36374][SHUFFLE][DOC] Push-based shuffle high level user documentation

2021-08-08 Thread GitBox



venkata91 commented on a change in pull request #33615:
URL: https://github.com/apache/spark/pull/33615#discussion_r684896710



##
File path: docs/configuration.md
##
@@ -3134,3 +3134,111 @@ The stage level scheduling feature allows users to 
specify task and executor res
 This is only available for the RDD API in Scala, Java, and Python.  It is 
available on YARN and Kubernetes when dynamic allocation is enabled. See the 
[YARN](running-on-yarn.html#stage-level-scheduling-overview) page or 
[Kubernetes](running-on-kubernetes.html#stage-level-scheduling-overview) page 
for more implementation details.
 
 See the `RDD.withResources` and `ResourceProfileBuilder` API's for using this 
feature. The current implementation acquires new executors for each 
`ResourceProfile`  created and currently has to be an exact match. Spark does 
not try to fit tasks into an executor that require a different ResourceProfile 
than the executor was created with. Executors that are not in use will idle 
timeout with the dynamic allocation logic. The default configuration for this 
feature is to only allow one ResourceProfile per stage. If the user associates 
more then 1 ResourceProfile to an RDD, Spark will throw an exception by 
default. See config `spark.scheduler.resource.profileMergeConflicts` to control 
that behavior. The current merge strategy Spark implements when 
`spark.scheduler.resource.profileMergeConflicts` is enabled is a simple max of 
each resource within the conflicting ResourceProfiles. Spark will create a new 
ResourceProfile with the max of each of the resources.
+
+# Push-based shuffle overview
+
+Push based shuffle helps improve the reliability and performance of spark 
shuffle. It takes a best-effort approach to push the shuffle blocks generated 
by the map tasks to remote shuffle services to be merged per shuffle partition. 
Reduce tasks fetch a combination of merged shuffle partitions and original 
shuffle blocks as their input data, resulting in converting small random disk 
reads by shuffle services into large sequential reads. Possibility of better 
data locality for reduce tasks additionally helps minimize network IO.
+
+  Currently push-based shuffle is only supported for Spark on YARN with 
external shuffle service. 
+
+### Shuffle server side configuration options
+
+
+Property NameDefaultMeaningSince 
Version
+
+  spark.shuffle.push.server.mergedShuffleFileManagerImpl
+  
+
org.apache.spark.network.shuffle.ExternalBlockHandler$NoOpMergedShuffleFileManager

Review comment:
   Ok. I tried changing CSS for `td` to `word-wrap` works fine, but it also 
changes the other tables. We would still have the issue of `$` for the config 
value which is an issue. Let me file another PR to handle that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



SparkQA commented on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894941957


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46716/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



AmplabJenkins commented on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894941971


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46716/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33665: [SPARK-36428][SQL] the seconds parameter of make_timestamp should accept integer type

2021-08-08 Thread GitBox



SparkQA commented on pull request #33665:
URL: https://github.com/apache/spark/pull/33665#issuecomment-894936793


   **[Test build #142206 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142206/testReport)**
 for PR 33665 at commit 
[`3496a4d`](https://github.com/apache/spark/commit/3496a4dfa53d3ab9cddf8c085c61d8b86757eda5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33646: [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3

2021-08-08 Thread GitBox



SparkQA commented on pull request #33646:
URL: https://github.com/apache/spark/pull/33646#issuecomment-894936794


   **[Test build #142207 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142207/testReport)**
 for PR 33646 at commit 
[`a5c169a`](https://github.com/apache/spark/commit/a5c169a6dc3ca4ecdadae0beabac8565def7a4f8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33682: [WIP][SPARK-36456][CORE][SQL] Clean up compilation warnings related to `method closeQuietly in class IOUtils is deprecated`

2021-08-08 Thread GitBox



SparkQA commented on pull request #33682:
URL: https://github.com/apache/spark/pull/33682#issuecomment-894936706


   **[Test build #142205 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142205/testReport)**
 for PR 33682 at commit 
[`6bd69d0`](https://github.com/apache/spark/commit/6bd69d05a5d298ba664ef8a46fe98e4b6e6736f8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33672: [SPARK-35320][SQL] Align error message for unsupported key types in MapType in Json reader

2021-08-08 Thread GitBox



AmplabJenkins commented on pull request #33672:
URL: https://github.com/apache/spark/pull/33672#issuecomment-894935085


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46715/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #30483: [SPARK-33449][SQL] Add File Metadata cache support for Parquet and Orc

2021-08-08 Thread GitBox



LuciferYang commented on pull request #30483:
URL: https://github.com/apache/spark/pull/30483#issuecomment-894933284


   > Hi, @LuciferYang . Are you still interested in this?
   
   Yes, I'm still interested in it. I'll try to update it to master first. 
However, since `ParquetFileReader` no longer has a non `deprecated` constructor 
support pass footer, we need to make a decision together.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



SparkQA removed a comment on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894922082


   **[Test build #142204 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142204/testReport)**
 for PR 33681 at commit 
[`8e7db98`](https://github.com/apache/spark/commit/8e7db98d43e4921293211c3cb13718b41712e4b2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33661: [SPARK-36431][SQL] Support comparison of ANSI intervals with different fields

2021-08-08 Thread GitBox



AngersZh commented on a change in pull request #33661:
URL: https://github.com/apache/spark/pull/33661#discussion_r684888789



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
##
@@ -840,10 +840,17 @@ abstract class BinaryComparison extends BinaryOperator 
with Predicate {
 
   final override val nodePatterns: Seq[TreePattern] = Seq(BINARY_COMPARISON)
 
-  override def checkInputDataTypes(): TypeCheckResult = 
super.checkInputDataTypes() match {
-case TypeCheckResult.TypeCheckSuccess =>
-  TypeUtils.checkForOrderingExpr(left.dataType, 
this.getClass.getSimpleName)
-case failure => failure
+  override def checkInputDataTypes(): TypeCheckResult = {
+val matched = (left.dataType, right.dataType) match {
+  case (l: DayTimeIntervalType, r: DayTimeIntervalType) => 
TypeCheckResult.TypeCheckSuccess

Review comment:
   > It's a bit weird that we allow different types in binary comparison. 
Can we fix the type coercion instead? e.g. 
`TypeCoercion.findTightestCommonType`. This is also more general, we can also 
support `coalesce(interval1, interval2)`
   
   If not change here, `checkInputDataTypes` will be false and expression is 
not resolved for  below UT.
   ```
checkEvaluation(EqualTo(
 Literal.create(10,
   YearMonthIntervalType(YearMonthIntervalType.YEAR, 
YearMonthIntervalType.MONTH)),
 Literal.create(10,
   YearMonthIntervalType(YearMonthIntervalType.MONTH, 
YearMonthIntervalType.MONTH))), true)
   ```
   If reasonable? or should I just remove the UT
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



SparkQA commented on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894931813


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46716/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33672: [SPARK-35320][SQL] Align error message for unsupported key types in MapType in Json reader

2021-08-08 Thread GitBox



SparkQA commented on pull request #33672:
URL: https://github.com/apache/spark/pull/33672#issuecomment-894931586


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46715/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang opened a new pull request #33682: [SPARK-36456][CORE][SQL] Clean up compilation warnings related to `method closeQuietly in class IOUtils is deprecated`

2021-08-08 Thread GitBox



LuciferYang opened a new pull request #33682:
URL: https://github.com/apache/spark/pull/33682


   ### What changes were proposed in this pull request?
   There are some compilation warnings related to `method closeQuietly in class 
IOUtils is deprecated`. 
   This pr introduce a new method named `closeQuietly` to 
`org.apache.spark.util.Utils` refer to `org.apache.commons.io.IOUtils` and use 
this method to clean up the depredation usage of `IOUtils.closeQuietly`.
   
   
   
   ### Why are the changes needed?
   Clean up compilation warnings related to `method closeQuietly in class 
IOUtils is deprecated`
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   
   - Pass the Jenkins or GitHub Action


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #33680: [SPARK-36454][SQL] Not push down partition filter to ORCScan for DSv2

2021-08-08 Thread GitBox



viirya commented on a change in pull request #33680:
URL: https://github.com/apache/spark/pull/33680#discussion_r684885485



##
File path: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
##
@@ -460,7 +460,7 @@ class ExplainSuite extends ExplainSuiteHelper with 
DisableAdaptiveExecutionSuite
   "parquet" ->
 "|PushedFilters: \\[IsNotNull\\(value\\), 
GreaterThan\\(value,2\\)\\]",
   "orc" ->
-"|PushedFilters: \\[.*\\(id\\), .*\\(value\\), .*\\(id,1\\), 
.*\\(value,2\\)\\]",
+"|PushedFilters: \\[IsNotNull\\(value\\), 
GreaterThan\\(value,2\\)\\]",

Review comment:
   Oh, I see. #30652 also only updated this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] venkata91 commented on a change in pull request #33615: [SPARK-36374][SHUFFLE][DOC] Push-based shuffle high level user documentation

2021-08-08 Thread GitBox



venkata91 commented on a change in pull request #33615:
URL: https://github.com/apache/spark/pull/33615#discussion_r684885417



##
File path: docs/configuration.md
##
@@ -3134,3 +3134,111 @@ The stage level scheduling feature allows users to 
specify task and executor res
 This is only available for the RDD API in Scala, Java, and Python.  It is 
available on YARN and Kubernetes when dynamic allocation is enabled. See the 
[YARN](running-on-yarn.html#stage-level-scheduling-overview) page or 
[Kubernetes](running-on-kubernetes.html#stage-level-scheduling-overview) page 
for more implementation details.
 
 See the `RDD.withResources` and `ResourceProfileBuilder` API's for using this 
feature. The current implementation acquires new executors for each 
`ResourceProfile`  created and currently has to be an exact match. Spark does 
not try to fit tasks into an executor that require a different ResourceProfile 
than the executor was created with. Executors that are not in use will idle 
timeout with the dynamic allocation logic. The default configuration for this 
feature is to only allow one ResourceProfile per stage. If the user associates 
more then 1 ResourceProfile to an RDD, Spark will throw an exception by 
default. See config `spark.scheduler.resource.profileMergeConflicts` to control 
that behavior. The current merge strategy Spark implements when 
`spark.scheduler.resource.profileMergeConflicts` is enabled is a simple max of 
each resource within the conflicting ResourceProfiles. Spark will create a new 
ResourceProfile with the max of each of the resources.
+
+# Push-based shuffle overview
+
+Push based shuffle helps improve the reliability and performance of spark 
shuffle. It takes a best-effort approach to push the shuffle blocks generated 
by the map tasks to remote shuffle services to be merged per shuffle partition. 
Reduce tasks fetch a combination of merged shuffle partitions and original 
shuffle blocks as their input data, resulting in converting small random disk 
reads by shuffle services into large sequential reads. Possibility of better 
data locality for reduce tasks additionally helps minimize network IO.
+
+  Currently push-based shuffle is only supported for Spark on YARN with 
external shuffle service. 
+
+### Shuffle server side configuration options
+
+
+Property NameDefaultMeaningSince 
Version
+
+  spark.shuffle.push.server.mergedShuffleFileManagerImpl
+  
+
org.apache.spark.network.shuffle.ExternalBlockHandler$NoOpMergedShuffleFileManager

Review comment:
   @mridulm Yeah I haven't tried that yet. But still the config key names 
are quite long, it would still not make the readability issue go away. 
Shouldn't this be handled at the CSS layer? Thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] WeichenXu123 commented on pull request #33652: [SPARK-36425] [PYSPARK][ML] Support CrossValidatorModel get standard deviation of metrics for each paramMap

2021-08-08 Thread GitBox



WeichenXu123 commented on pull request #33652:
URL: https://github.com/apache/spark/pull/33652#issuecomment-894927990


   Thanks @HyukjinKwon !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



AmplabJenkins commented on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894923739


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142204/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



SparkQA commented on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894923723


   **[Test build #142204 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142204/testReport)**
 for PR 33681 at commit 
[`8e7db98`](https://github.com/apache/spark/commit/8e7db98d43e4921293211c3cb13718b41712e4b2).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33672: [SPARK-35320][SQL] Align error message for unsupported key types in MapType in Json reader

2021-08-08 Thread GitBox



SparkQA commented on pull request #33672:
URL: https://github.com/apache/spark/pull/33672#issuecomment-894922050


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46715/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



SparkQA commented on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894922082


   **[Test build #142204 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142204/testReport)**
 for PR 33681 at commit 
[`8e7db98`](https://github.com/apache/spark/commit/8e7db98d43e4921293211c3cb13718b41712e4b2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33634: [SPARK-36369][PYTHON] Fix Index.union to follow pandas 1.3

2021-08-08 Thread GitBox



AmplabJenkins commented on pull request #33634:
URL: https://github.com/apache/spark/pull/33634#issuecomment-894920862


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46714/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



HeartSaVioR commented on pull request #33681:
URL: https://github.com/apache/spark/pull/33681#issuecomment-894919228


   I probably need to convert the Scala example to Java one as well. Marking 
this as draft for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR opened a new pull request #33681: [SPARK-36455][SS] Provide an example of complex session window via flatMapGroupsWithState

2021-08-08 Thread GitBox



HeartSaVioR opened a new pull request #33681:
URL: https://github.com/apache/spark/pull/33681


   ### What changes were proposed in this pull request?
   
   This PR proposes to add a new example of complex sessionization, which 
leverages flatMapGroupsWithState.
   
   ### Why are the changes needed?
   
   We have replaced an example of sessionization from flatMapGroupsWithState to 
native support of session window. Given there are still use cases on 
sessionization which native support of session window cannot cover, it would be 
nice if we can demonstrate such case. It will also be used as an example of 
flatMapGroupsWithState.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Manually tested. Example data is given in class doc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33634: [SPARK-36369][PYTHON] Fix Index.union to follow pandas 1.3

2021-08-08 Thread GitBox



SparkQA commented on pull request #33634:
URL: https://github.com/apache/spark/pull/33634#issuecomment-894918438


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46714/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on a change in pull request #33603: [SPARK-36376][SQL] Collapse repartitions if there is a project between them

2021-08-08 Thread GitBox



wangyum commented on a change in pull request #33603:
URL: https://github.com/apache/spark/pull/33603#discussion_r684877019



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##
@@ -913,10 +913,17 @@ object CollapseRepartition extends Rule[LogicalPlan] {
   case (false, true) => if (r.numPartitions >= child.numPartitions) child 
else r
   case _ => r.copy(child = child.child)
 }
+case r @ Repartition(_, _, p @ Project(_, child: RepartitionOperation)) =>

Review comment:
   Not all `RepartitionOperation` can be removed. Sometimes repartition 
before joining and filtering to increase parallelism, especially before 
`BroadcastNestedLoopJoin`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ekoifman commented on pull request #33641: [SPARK-36416][SQL] Add SQL metrics to AdaptiveSparkPlanExec for BHJs and Skew joins

2021-08-08 Thread GitBox



ekoifman commented on pull request #33641:
URL: https://github.com/apache/spark/pull/33641#issuecomment-894917212


   @cloud-fan could you take a look when you have a chance


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33679: [SPARK-36452][SQL]: Add the support in Spark for having group by map datatype column for the scenario that works in Hive

2021-08-08 Thread GitBox



HyukjinKwon commented on a change in pull request #33679:
URL: https://github.com/apache/spark/pull/33679#discussion_r684872927



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ordering.scala
##
@@ -97,13 +97,18 @@ object InterpretedOrdering {
 object RowOrdering extends 
CodeGeneratorWithInterpretedFallback[Seq[SortOrder], BaseOrdering] {
 
   /**
-   * Returns true iff the data type can be ordered (i.e. can be sorted).
+   * Returns true if the data type can be ordered (i.e. can be sorted).
*/
-  def isOrderable(dataType: DataType): Boolean = dataType match {
+  def isOrderable(dataType: DataType,

Review comment:
   Should we fix https://github.com/apache/spark/pull/31967 first?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33679: [SPARK-36452][SQL]: Add the support in Spark for having group by map datatype column for the scenario that works in Hive

2021-08-08 Thread GitBox



HyukjinKwon commented on a change in pull request #33679:
URL: https://github.com/apache/spark/pull/33679#discussion_r684872676



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ordering.scala
##
@@ -97,13 +97,18 @@ object InterpretedOrdering {
 object RowOrdering extends 
CodeGeneratorWithInterpretedFallback[Seq[SortOrder], BaseOrdering] {
 
   /**
-   * Returns true iff the data type can be ordered (i.e. can be sorted).
+   * Returns true if the data type can be ordered (i.e. can be sorted).

Review comment:
   iff is an abbreviation of if and only if




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33646: [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3

2021-08-08 Thread GitBox



AmplabJenkins commented on pull request #33646:
URL: https://github.com/apache/spark/pull/33646#issuecomment-894914281


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46713/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33646: [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3

2021-08-08 Thread GitBox



SparkQA commented on pull request #33646:
URL: https://github.com/apache/spark/pull/33646#issuecomment-894914275


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46713/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #33634: [SPARK-36369][PYTHON] Fix Index.union to follow pandas 1.3

2021-08-08 Thread GitBox



HyukjinKwon closed pull request #33634:
URL: https://github.com/apache/spark/pull/33634


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon edited a comment on pull request #33634: [SPARK-36369][PYTHON] Fix Index.union to follow pandas 1.3

2021-08-08 Thread GitBox



HyukjinKwon edited a comment on pull request #33634:
URL: https://github.com/apache/spark/pull/33634#issuecomment-894909826


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33634: [SPARK-36369][PYTHON] Fix Index.union to follow pandas 1.3

2021-08-08 Thread GitBox



HyukjinKwon commented on pull request #33634:
URL: https://github.com/apache/spark/pull/33634#issuecomment-894909826


   Merged to master and branch-3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33646: [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3

2021-08-08 Thread GitBox



HyukjinKwon commented on pull request #33646:
URL: https://github.com/apache/spark/pull/33646#issuecomment-894909570


   @itholic can you check the test failures? 
https://github.com/itholic/spark/runs/3276195042?check_suite_focus=true


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33672: [SPARK-35320][SQL] Align error message for unsupported key types in MapType in Json reader

2021-08-08 Thread GitBox



SparkQA commented on pull request #33672:
URL: https://github.com/apache/spark/pull/33672#issuecomment-894907782


   **[Test build #142203 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142203/testReport)**
 for PR 33672 at commit 
[`cc0e8c8`](https://github.com/apache/spark/commit/cc0e8c84ef657af188536fda6c8663f8abdb923b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33646: [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3

2021-08-08 Thread GitBox



AmplabJenkins commented on pull request #33646:
URL: https://github.com/apache/spark/pull/33646#issuecomment-894907498


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142201/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33634: [SPARK-36369][PYTHON] Fix Index.union to follow pandas 1.3

2021-08-08 Thread GitBox



AmplabJenkins commented on pull request #33634:
URL: https://github.com/apache/spark/pull/33634#issuecomment-894907497


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142202/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33634: [SPARK-36369][PYTHON] Fix Index.union to follow pandas 1.3

2021-08-08 Thread GitBox



SparkQA commented on pull request #33634:
URL: https://github.com/apache/spark/pull/33634#issuecomment-894906084


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46714/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #33646: [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3

2021-08-08 Thread GitBox



SparkQA removed a comment on pull request #33646:
URL: https://github.com/apache/spark/pull/33646#issuecomment-894892998


   **[Test build #142201 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142201/testReport)**
 for PR 33646 at commit 
[`11ea4a2`](https://github.com/apache/spark/commit/11ea4a241e5ab5c6f9ecdcbc4d6eb041712e6be7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #33634: [SPARK-36369][PYTHON] Fix Index.union to follow pandas 1.3

2021-08-08 Thread GitBox



SparkQA removed a comment on pull request #33634:
URL: https://github.com/apache/spark/pull/33634#issuecomment-894894751


   **[Test build #142202 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142202/testReport)**
 for PR 33634 at commit 
[`dc8f0e8`](https://github.com/apache/spark/commit/dc8f0e8719e1ae522cf0b6ecc03e913b105bfa20).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #33680: [SPARK-36454][SQL] Not push down partition filter to ORCScan for DSv2

2021-08-08 Thread GitBox



huaxingao commented on pull request #33680:
URL: https://github.com/apache/spark/pull/33680#issuecomment-894905042


   > is it possible to add a test?
   
   @viirya Thanks for taking a look. The reason that I didn't add a new test is 
because we have partition pruning test with both partition filters and data 
filters here 
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala#L734
   For pushed down filters display in explain, i modified the expected result 
in `ExplainSuite`. Any suggestions for the new tests to add?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33646: [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3

2021-08-08 Thread GitBox



SparkQA commented on pull request #33646:
URL: https://github.com/apache/spark/pull/33646#issuecomment-894904735


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46713/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33672: [SPARK-35320][SQL] Align error message for unsupported key types in MapType in Json reader

2021-08-08 Thread GitBox



HyukjinKwon commented on a change in pull request #33672:
URL: https://github.com/apache/spark/pull/33672#discussion_r684865386



##
File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
##
@@ -2943,6 +2943,34 @@ class DataFrameSuite extends QueryTest
   .withSequenceColumn("default_index").collect().map(_.getLong(0))
 assert(ids.toSet === Range(0, 10).toSet)
   }
+
+  test("SPARK-35320 DataFrame read in Json format should fail if the schema 
provided " +
+"by the user contains a MapType with a key type different of StringType") {
+
+Seq((MapType(IntegerType, StringType), """{"1": "test"}"""),
+  (StructType(Seq(StructField("test", MapType(IntegerType, StringType,
+test": {"1": "test"}"""),
+  (ArrayType(MapType(IntegerType, StringType)), """[{"1": "test"}]"""),
+  (MapType(StringType, MapType(IntegerType, StringType)), """{"key": {"1" 
: "test"}}""")
+).foreach { case (schema, jsonData) =>
+  withTempDir { dir =>
+val colName = "col"
+val msg = "can only contain StringType as a key type for a MapType"
+
+val thrown1 = intercept[AnalysisException] (
+  spark.read.schema(StructType(Seq(StructField(colName, schema
+.json(Seq(jsonData).toDS()).collect())
+assert(thrown1.getMessage contains msg)
+
+val jsonDir = new File(dir, "json").getCanonicalPath
+Seq(jsonData).toDF(colName).write.json(jsonDir)
+val thrown2 = intercept[AnalysisException] (
+  spark.read.schema(StructType(Seq(StructField(colName, schema
+.json(jsonDir).collect())
+assert(thrown2.getMessage contains msg)

Review comment:
   Can we call it with explicit `.`? See also 
https://github.com/databricks/scala-style-guide#infix




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33672: [SPARK-35320][SQL] Align error message for unsupported key types in MapType in Json reader

2021-08-08 Thread GitBox



HyukjinKwon commented on a change in pull request #33672:
URL: https://github.com/apache/spark/pull/33672#discussion_r684865288



##
File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
##
@@ -2943,6 +2943,34 @@ class DataFrameSuite extends QueryTest
   .withSequenceColumn("default_index").collect().map(_.getLong(0))
 assert(ids.toSet === Range(0, 10).toSet)
   }
+
+  test("SPARK-35320 DataFrame read in Json format should fail if the schema 
provided " +
+"by the user contains a MapType with a key type different of StringType") {
+
+Seq((MapType(IntegerType, StringType), """{"1": "test"}"""),
+  (StructType(Seq(StructField("test", MapType(IntegerType, StringType,
+test": {"1": "test"}"""),
+  (ArrayType(MapType(IntegerType, StringType)), """[{"1": "test"}]"""),
+  (MapType(StringType, MapType(IntegerType, StringType)), """{"key": {"1" 
: "test"}}""")
+).foreach { case (schema, jsonData) =>
+  withTempDir { dir =>
+val colName = "col"
+val msg = "can only contain StringType as a key type for a MapType"
+
+val thrown1 = intercept[AnalysisException] (

Review comment:
   ```suggestion
   val thrown1 = intercept[AnalysisException](
   ```

##
File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
##
@@ -2943,6 +2943,34 @@ class DataFrameSuite extends QueryTest
   .withSequenceColumn("default_index").collect().map(_.getLong(0))
 assert(ids.toSet === Range(0, 10).toSet)
   }
+
+  test("SPARK-35320 DataFrame read in Json format should fail if the schema 
provided " +
+"by the user contains a MapType with a key type different of StringType") {
+
+Seq((MapType(IntegerType, StringType), """{"1": "test"}"""),
+  (StructType(Seq(StructField("test", MapType(IntegerType, StringType,
+test": {"1": "test"}"""),
+  (ArrayType(MapType(IntegerType, StringType)), """[{"1": "test"}]"""),
+  (MapType(StringType, MapType(IntegerType, StringType)), """{"key": {"1" 
: "test"}}""")
+).foreach { case (schema, jsonData) =>
+  withTempDir { dir =>
+val colName = "col"
+val msg = "can only contain StringType as a key type for a MapType"
+
+val thrown1 = intercept[AnalysisException] (
+  spark.read.schema(StructType(Seq(StructField(colName, schema
+.json(Seq(jsonData).toDS()).collect())
+assert(thrown1.getMessage contains msg)
+
+val jsonDir = new File(dir, "json").getCanonicalPath
+Seq(jsonData).toDF(colName).write.json(jsonDir)
+val thrown2 = intercept[AnalysisException] (

Review comment:
   ```suggestion
   val thrown2 = intercept[AnalysisException](
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33672: [SPARK-35320][SQL] Align error message for unsupported key types in MapType in Json reader

2021-08-08 Thread GitBox



HyukjinKwon commented on a change in pull request #33672:
URL: https://github.com/apache/spark/pull/33672#discussion_r684865267



##
File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
##
@@ -2943,6 +2943,34 @@ class DataFrameSuite extends QueryTest
   .withSequenceColumn("default_index").collect().map(_.getLong(0))
 assert(ids.toSet === Range(0, 10).toSet)
   }
+
+  test("SPARK-35320 DataFrame read in Json format should fail if the schema 
provided " +
+"by the user contains a MapType with a key type different of StringType") {
+
+Seq((MapType(IntegerType, StringType), """{"1": "test"}"""),
+  (StructType(Seq(StructField("test", MapType(IntegerType, StringType,
+test": {"1": "test"}"""),
+  (ArrayType(MapType(IntegerType, StringType)), """[{"1": "test"}]"""),
+  (MapType(StringType, MapType(IntegerType, StringType)), """{"key": {"1" 
: "test"}}""")

Review comment:
   ```suggestion
   Seq(
 (MapType(IntegerType, StringType), """{"1": "test"}"""),
 (StructType(Seq(StructField("test", MapType(IntegerType, 
StringType,
   test": {"1": "test"}"""),
 (ArrayType(MapType(IntegerType, StringType)), """[{"1": "test"}]"""),
 (MapType(StringType, MapType(IntegerType, StringType)), """{"key": 
{"1" : "test"}}""")
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33672: [SPARK-35320][SQL] Align error message for unsupported key types in MapType in Json reader

2021-08-08 Thread GitBox



HyukjinKwon commented on a change in pull request #33672:
URL: https://github.com/apache/spark/pull/33672#discussion_r684865162



##
File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
##
@@ -2943,6 +2943,34 @@ class DataFrameSuite extends QueryTest
   .withSequenceColumn("default_index").collect().map(_.getLong(0))
 assert(ids.toSet === Range(0, 10).toSet)
   }
+
+  test("SPARK-35320 DataFrame read in Json format should fail if the schema 
provided " +

Review comment:
   Could we make the test title simpler? e.g.) Reading JSON with string key 
in a map should fail




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33672: [SPARK-35320][SQL] Align error message for unsupported key types in MapType in Json reader

2021-08-08 Thread GitBox



HyukjinKwon commented on a change in pull request #33672:
URL: https://github.com/apache/spark/pull/33672#discussion_r684865045



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -402,7 +402,11 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* @since 2.0.0
*/
   @scala.annotation.varargs
-  def json(paths: String*): DataFrame = format("json").load(paths : _*)
+  def json(paths: String*): DataFrame = {
+userSpecifiedSchema.foreach(
+  ExprUtils.checkJsonSchema(_).foreach(e => throw new 
AnalysisException(e)))

Review comment:
   I think we would have to throw an exception via 
`QueryCompilationErrors`, cc @karenfeng 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33634: [SPARK-36369][PYTHON] Fix Index.union to follow pandas 1.3

2021-08-08 Thread GitBox



SparkQA commented on pull request #33634:
URL: https://github.com/apache/spark/pull/33634#issuecomment-894903536


   **[Test build #142202 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142202/testReport)**
 for PR 33634 at commit 
[`dc8f0e8`](https://github.com/apache/spark/commit/dc8f0e8719e1ae522cf0b6ecc03e913b105bfa20).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class BlockSavedOnDecommissionedBlockManagerException(blockId: BlockId)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33672: [SPARK-35320][SQL] Align error message for unsupported key types in MapType in Json reader

2021-08-08 Thread GitBox



HyukjinKwon commented on a change in pull request #33672:
URL: https://github.com/apache/spark/pull/33672#discussion_r684864831



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
##
@@ -561,15 +561,9 @@ case class JsonToStructs(
 
   override def checkInputDataTypes(): TypeCheckResult = nullableSchema match {
 case _: StructType | _: ArrayType | _: MapType =>
-  val invalidMapType = nullableSchema.existsRecursively(dataType => 
dataType match {
-case MapType(keyType, _, _) if keyType != StringType => true
-case _ => false
-  })
-  if (invalidMapType) {
-TypeCheckResult.TypeCheckFailure(
-  s"Input schema ${nullableSchema.catalogString} can only contain 
StringType " +
-"as a key type for a MapType.")
-  } else {
+  ExprUtils.checkJsonSchema(nullableSchema).map{

Review comment:
   ```suggestion
 ExprUtils.checkJsonSchema(nullableSchema).map {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33672: [SPARK-35320][SQL] Align error message for unsupported key types in MapType in Json reader

2021-08-08 Thread GitBox



HyukjinKwon commented on pull request #33672:
URL: https://github.com/apache/spark/pull/33672#issuecomment-894903199


   ok to test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33673: [SPARK-36448][SQL] Exceptions in NoSuchItemException.scala have to be case classes

2021-08-08 Thread GitBox



HyukjinKwon commented on pull request #33673:
URL: https://github.com/apache/spark/pull/33673#issuecomment-894903074


   @yeshengm mind fixing the test failures? BTW, why do we need to fix them if 
it doesn't cause any user facing behaviour?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33646: [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3

2021-08-08 Thread GitBox



SparkQA commented on pull request #33646:
URL: https://github.com/apache/spark/pull/33646#issuecomment-894901360


   **[Test build #142201 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142201/testReport)**
 for PR 33646 at commit 
[`11ea4a2`](https://github.com/apache/spark/commit/11ea4a241e5ab5c6f9ecdcbc4d6eb041712e6be7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #33680: [SPARK-36454][SQL] Not push down partition filter to ORCScan for DSv2

2021-08-08 Thread GitBox



viirya commented on pull request #33680:
URL: https://github.com/apache/spark/pull/33680#issuecomment-894900342


   Hmm, is it possible to add a test?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 229 matches

Mail list logo